Hortonworks University. Education Catalog 2018 Q1

Size: px
Start display at page:

Download "Hortonworks University. Education Catalog 2018 Q1"

Transcription

1 Hortonworks University Education Catalog 2018 Q1 Revised 03/13/2018

2 TABLE OF CONTENTS About Hortonworks University... 2 Training Delivery Options... 3 Available Courses List... 4 Blended Learning... 6 Self-Paced Learning Library Hortonworks Hadoop Essentials HDP Operations: Hadoop Admin Foundations HDP Operations: HDP Administration Fast Track HDF Operations: HDF NiFi Flow Management HDP Operations: Hadoop Admin II HDP Operations: Hadoop Security HDP Operations: HBase Management...40 HDP Developer: Developer Quick Start HDP Developer Apache Pig and Hive HDP Developer Java HDP Developer: HBase Essentials HDP Developer: Apache Storm and Trident HDP Developer YARN Applications HDP Developer Spark 2.x Developer HDP Developer: Real-Time Development HDP Analyst: Data Science Foundations

3 ABOUT HORTONWORKS UNIVERSITY Hortonworks University provides an immersive and valuable real world experience with scenario-based training Courses in public, private on site and virtual led courses, the self-paced learning library, and an Academic program. All courses include Industry-leading lecture and hands-on labs. INDIVIDUALIZED LEARNING PATHS Hortonworks University offers individualized Learning Paths that will allow for targeted learning, based on your needs and interests. We currently offer Learning Paths for users that are new to Hadoop, Developers, Administrators and Analysts. New to Hadoop? Our courses introduce you to high-level concepts, architecture, operation and uses of HDP and are a good starting point for your Big Data journey. Developers Our courses for Developers introduce topics including developing data applications, extensions, real-time solutions and architecture. Administrators Our courses for Administrators introduce topics such as installing, managing, monitoring, advanced operations, security and governance. Analysts Our courses for Analysts introduce topics such as SQL, scripting languages, machine learning, Data Science, Big Data Analytics and AI. LEARN FROM THE EXPERTS Hortonworks University offers a wide variety of live-instructor led courses in order to provide comprehensive real-world training. Each course is taught by a Certified Hortonworks Instructor and includes a combination of instructor-led lecture, classroom discussion and comprehensive hands-on-lab exercises. PUBLIC AND PRIVATE TRAINING Classes are available as both Public and Private deliveries. Private deliveries are available for groups of 6 or more students and can be delivered either on-site or virtual. Please contact us for additional information. 2

4 TRAINING DELIVERY OPTIONS At Hortonworks University, we realize that everyone has a different learning style and a different schedule. As a result, we strive to offer several delivery options for many of our classes that you can choose from, including: Live Instructor-Led, the Self-Paced Learning Library and Blended Learning. Below is a high-level overview of each delivery option: Live Instructor-Led Most Hortonworks University live instructor-led course are available in both in-person classroom settings or virtual instructor-led training. Virtual courses are delivered via WebEx, and include instructor-led discussion and lecture, and comprehensive hands-on lab exercises. Self-Paced Learning Library The Hortonworks University Self-Paced Learning Library is an on-demand learning library that is accessed using a Hortonworks University account. Users can view lessons anywhere, at any time, and complete lessons at their own pace. Blended Learning Hortonworks University Blended Learning is a unique offering that combines the flexibility of self-paced learning, with the benefits of live instructor-led sessions, each blended learning course includes lessons and hands-on lab exercises that the student can complete at their own convenience. In addition, each course will have regularly scheduled one to two hour live on-line micro sessions. Additional information on the Self-Paced Learning Library and Blended Learning will be provided later in this catalog. 3

5 AVAILABLE COURSES Listed below are the various courses we currently offer for each Learning Path. Detailed information related to each course can be found on the course pages in this catalog. COURSE NUMBER TITLE LEVEL PATH HDP-123 Hortonworks Hadoop Essentials General Audience General Audience ADM-221 HDP Operations Hadoop Admin Foundations Foundations Administrator HDP Operations: Administration Fast Track Foundations Administrator ADM-301 HDF Operations: HDF NiFi Flow Management Subject Matter Expert Administrator ADM-303 HDP Operations: Hadoop Admin II Subject Matter Expert Administrator ADM-351 HDP: Operations: Hadoop Security Subject Matter Expert Administrator ADM-305 HDP Operations: HBase Management Subject Matter Expert Administrator DEV-201 HDP Developer: Developer Quick Start Subject Matter Expert Developer DEV-203 HDP Developer: Apache Pig and Hive Subject Matter Expert Developer DEV-301 HDP Developer: Java Subject Matter Expert Developer DEV-303 HDP Developer: HBase Essentials Subject Matter Expert Developer DEV-305 HDP Developer: Apache Storm and Trident Subject Matter Expert Developer DEV-307 HDP Developer: YARN Applications Subject Matter Expert Developer DEV-343 HDP Developer: Spark 2.x Developer Subject Matter Expert Developer HDP Developer: Real Time Development Subject Matter Expert Developer SCI-221 HDP Analyst: Data Science Foundations Foundations Analyst 4

6 Hortonworks University Blended Learning 5

7 TRAINING OFFERING HORTONWORKS UNIVERSITY BLENDED LEARNING OVERVIEW Hortonworks University Blended Learning is a unique offering that combines the flexibility of self-paced learning, with the benefits of live instructor-led sessions. Each Hortonworks Blended course includes lessons and hands-on lab exercises that the student can complete at their own convenience. In addition, each course will have regularly scheduled one to two hour live on-line micro sessions to further enhance and supplement the learning experience. Pricing for each course will be based on the number of credit hours associated with each course. Students have the option to purchase courses individually, rather than an entire library subscription. This will allow for targeted learning of specific topics. Courses will be updated on an ongoing basis to ensure that the latest information is available to allow students to keep pace with the ever changing field of Big Data. LIVE MICRO SESSIONS To further enhance your learning experience, approximately each week, one to two hour live micro sessions will be scheduled. During each session, Hortonworks University instructors will discuss various topics related to each course. Students will also have the opportunity to discuss any course related questions with their instructor. These sessions will also be available in recorded format after each session and can be viewed at any time. Please note that the live micro sessions schedule is subject to change and instructor availability. FLEXIBLE PURCHASE OPTIONS In order to help you meet your learning goals, we offer two options for purchasing courses: Individual Courses: Courses can be purchased individually with pricing for each course based on the number on credit hours for each course. By Learning Path: Courses can be purchased by path. Currently, the Administrator path is available and contains HDP Administration Foundations, Hortonworks NiFi Flow Management and HDP Security. 6

8 DURATION Access to each Hortonworks University Blended course varies depending on the type of purchase. Courses purchased individually can be accessed via a 6-month subscription period. Courses purchased as part of a learning track can be accessed via a 12-month subscription period. Subscriptions are valid for one individual named user. Any additional users who wish to access course content or attend live micro sessions will be required to purchase a subscription. TARGET AUDIENCE Hortonworks University Blended courses are designed for architects, developers, analysts, data scientists, and IT decision makers as well as those new to Hadoop essentially anyone with a need or desire to learn more about Apache Hadoop and the Hortonworks Data Platform framework HANDS-ON LABS Hortonworks University Blended courses include hands-on lab exercises written by Big Data experts. The labs allow students to further reinforce their learning, by providing an opportunity to perform real world tasks. These labs are completed using two different options for lab environments: Downloadable Virtual Machines: For some courses, students will be provided with a downloadable virtual machine image. This will allow lab environments to be run locally on software such as VMWare Player or VirtualBox. Amazon AWS Images: For some courses, we will provide students with access to Amazon Machine Images (AMI). In this scenario, students will be required to have their own Amazon AWS account. Detailed setup instructions will be provided for each lab environment. 7

9 BLENDED LEARING COURSES The following courses are currently available for Hortonworks University Blended Learning: DEV-BL-203: Developer Pig & Hive 4 Credit Hours DEV-BL-343: Spark Developer 4 Credit Hours ADM-BL-221: HDP Operations: Administration Foundation/Core 4 Credit Hours ADM-BL-351: HDP Operations: Security 3 Credit Hours ADM-BL-301: NiFi Flow Management 3 Credit Hours ACCESSING BLENDED LEARING COURSES To access and purchase a learning track or a Hortonworks University Blended course, visit: learn.hortonworks.com PREREQUISITES The prerequisites for each Hortonworks University Blended course will vary. Visit training.hortonworks.com for additional information regarding each course. ABOUT HORTONWORKS UNIVERSITY Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for developers, administrators, data analysts and other IT professionals involved in implementing big data solutions. Classes combine presentation material with industry-leading hands-on labs that fully prepare students for real-world Hadoop deployments. REVISED 02/15/2018 8

10 Hortonworks University Self-Paced Learning Library 9

11 TRAINING OFFERING HORTONWORKS UNIVERSITY SELF-PACED LEARNING LIBRARY OVERVIEW The Hortonworks University Self-Paced Learning Library is an on- demand, online, learning repository that is accessed using a Hortonworks University account. Learners can view lessons anywhere, at any time, and complete lessons at their own pace. Lessons can be stopped and started, as needed, and completion is tracked via the Hortonworks University Learning Management System. This learning library makes it easy for Hadoop Administrators, Data Analysts, and Developers to continuously learn and stay up- to-date on Hortonworks Data Platform. Hortonworks University courses are designed and developed by Hadoop experts and provide an immersive and valuable real world experience. In our scenario-based training courses, we offer unmatched depth and expertise. We prepare you to be an expert with highly valued, practical skills and prepare you to successfully complete Hortonworks Technical Certifications. The Self-Paced learning library accelerates time to Hadoop competency. In addition, the learning library content is constantly being expanded with new content being added on an ongoing basis. DURATION Access to the Hortonworks University Self Paced Learning Library is provided for a 12-month subscription period per individual named user. TARGET AUDIENCE The Hortonworks University Self-Paced Learning Library is designed for architects, developers, analysts, data scientists, and IT decision makers as well as those new to Hadoop essentially anyone with a need or desire to learn more about Apache Hadoop and the Hortonworks Data Platform framework. 10

12 SELF-PACED LEARNING CONTENT LEARNING PATHS The following courses are available in each learning path in the Self-Paced Learning Library: HDP Overview Learning Path Apache Hadoop Essentials HDP Developer Learning Path Apache Pig and Hive Developing Applications with Java Developing Custom Yarn Applications Storm and Trident Fundamentals Spark Developer 2.x HBase Essentials HDP Operations Learning Path Hadoop Administration Foundations Hadoop Administration 2 HDF NiFi Flow Management Apache HBase Advanced Management Hadoop Security HDP Analyst Learning Path Data Science 11

13 ACCESSING THE SELF-PACED LEARNING To access and purchase a subscription to the Hortonworks University Self-Paced Learning library visit: PREREQUISITES There are no prerequisites for the Hortonworks University Self-Paced Learning Library. ABOUT HORTONWORKS UNIVERSITY Hortonworks University is your expert source for Apache Hadoop training and certification. Public and private on-site courses are available for developers, administrators, data analysts and other IT professionals involved in implementing big data solutions. Classes combine presentation material with industry-leading hands-on labs that fully prepare students for real-world Hadoop deployments. ABOUT HORTONWORKS UNIVERSITY Hortonworks develops, distributes and supports the only 100% open source distribution of Apache Hadoop explicitly architected, built and tested for enterprise-grade deployments. REVISED 2/15/

14 Hortonworks University Live Instructor-Led Courses 13

15 TRAINING OFFERING HDP-123 HORTONWORKS HADOOP ESSENTIALS 1 DAY GENERAL AUDIENCE This course provides a technical overview of Apache Hadoop. It includes high-level information about concepts, architecture, operation, and uses of the Hortonworks Data Platform (HDP) and the Hadoop ecosystem. The course provides an optional primer for those who plan to attend a hands-on, instructor-led course. PREREQUISITES No previous Hadoop or programming knowledge is required. Students will need browser access to the Internet. TARGET AUDIENCE Data architects, data integration architects, managers, C-level executives, decision makers, technical infrastructure team, and Hadoop administrators or developers who want to understand the fundamentals of Big Data and the Hadoop ecosystem. FORMAT 100% Lecture/Instructor Discussion AGENDA SUMMARY Day 1: Hadoop Overview and Demonstrations 14

16 OBJECTIVES Describe the case for Hadoop Identify the Hadoop Ecosystem and architecture Data Management HDFS, YARN Data Access Pig, Hive, HBase, Storm, Solr, Spark Data Governance & Integration Falcon, Flume, Sqoop, Kafka, Atlas Security Kerberos, Falcon, Knox Operations Ambari, Zookeeper, Oozie, Cloudbreak Observe popular data transformation and processing engines in action: Apache Hive, Apache Pig, Apache Spark Detail the architecture and features of YARN Describe backup and recovery options Describe how to secure Hadoop Explain the fundamentals of parallel processing Describe data ingestion options and frameworks for batch and real-time streaming Detail the HDFS architecture DEMONSTRATIONS Operational Overview with Ambari Ingesting Data into HDFS Data Manipulation with Hive Risk Factor Analysis with Pig Risk Analysis with Spark Securing Hive with Ranger Revised 10/26/

17 TRAINING OFFERING ADM-221 HDP HADOOP ADMINISTRATION FOUNDATIONS 4 DAYS FOUNDATION This course is intended for systems administrators who will be responsible for the design, installation, configuration, and management of the Hortonworks Data Platform (HDP). The course provides in-depth knowledge and experience in using Apache Ambari as the operational management platform for HDP. This course presumes no prior knowledge or experience with Hadoop. PREREQUISITES Students must have experience working in a Linux environment with standard Linux system commands. Students should be able to read and execute basic Linux shell scripts. Basic knowledge of SQL statements is recommended, but not a requirement. In addition, it is recommended for students to have some operational experience in data center practices, such as change management, release management, incident management, and problem management. It is also strongly recommended that you complete HDP-123 Hortonworks Hadoop Essentials before taking this course. TARGET AUDIENCE Linux administrators and system operators responsible for installing, configuring and managing an HDP cluster. FORMAT 50% Lecture 50% Hands-on Labs AGENDA SUMMARY Day 1: Introduction to Big Data, Hadoop and the Hortonworks Data Platform Day 2: Managing HDFS Storage, Rack Awareness, HDFS Snapshots and HDFS Centralized Cache Day 3: Introduction to YARN Day 4: High Availability with HDP, Deploying HDP with Blueprints, and the HDP Upgrade Process 16

18 DAY 1 OBJECTIVES Describe Apache Hadoop Summarize the Purpose of the Hortonworks Data Platform Software Frameworks List Hadoop Cluster Management Choices Describe Apache Ambari Identify Hadoop Cluster Deployment Options Plan for a Hadoop Cluster Deployment Perform an Interactive HDP Installation using Apache Ambari Install Apache Ambari Describe the Differences Between Hadoop Users, Hadoop Service Owners, and Apache Ambari Users Manage Users, Groups and Permissions Identify Hadoop Configuration Files Summarize Operations of the Web UI Tool Manage Hadoop Service Configuration Properties Using the Apache Ambari Web UI Describe the Hadoop Distributed File System (HDFS) Perform HDFS Shell Operations Use WebHDFS Protect Data Using HDFS Access Control Lists (ACLs) DAY 1 LABS Setting Up the Environment Installing HDP Managing Ambari Users and Groups Managing Hadoop Services Using HDFS Storage Using WebHDFS Using HDFS Access Control Lists 17

19 DAY 2 OBJECTIVES Describe HDFS Architecture and Operation Manage HDFS using Ambari Web, NameNode and DataNode UIs Manage HDFS using Command-line Tools Summarize the Purpose and Benefits of Rack Awareness Configure Rack Awareness Summarize Hadoop Backup Considerations Enable and Manage HDFS Snapshots Copy Data Using DistCP Use Snapshots and DistCP Together Identify the Purpose and Operation of Heterogeneous HDFS Storage Summarize the Purpose and Operation of HDFS Centralized Caching Configure HDFS Centralized Cache Define and Manage Cache Pools and Cache Directives Identify HDFS NFS Gateway Use Cases Recall HDFS NFS Gateway Architecture and Operation Install and Configure an HDFS NFS Gateway Configure an HDFS NFS Gateway Client DAY 2 LABS Managing HDFS Storage Managing HDFS Quotas Configuring Rack Awareness Managing HDFS Snapshots Using DistCP Configuring HDFS Storage Policies Configuring HDFS Centralized Cache Configuring an NFS Gateway 18

20 DAY 3 OBJECTIVES Describe YARN Resource Management Summarize YARN Architecture and Operation Identify and Use YARN Management Options Summarize YARN Response to Component Failure Understand the Basics of Running Simple YARN Applications Summarize the Purpose and Operation of the YARN Capacity Scheduler Configure and Manage YARN Queues Control Access to YARN Queues Summarize the Purpose and Operation of YARN Node Labels Describe the Process used to Create Node Labels Describe the Process Used to Add, Modify and Remove Node Labels Configure Queues to Access Node Label Resources Run Test Jobs to Confirm Node Label Behavior DAY 3 LABS Managing YARN Using Ambari Managing YARN Using CLI Running Sample YARN Applications Setting Up for Capacity Scheduler Managing YARN Containers and Queues Managing YARN ACLs and User Limits Working with YARN Node Labels 19

21 DAY 4 OBJECTIVES Summarize the Purpose of NameNode HA Configure NameNode HA Using Ambari Summarize the Purpose of ResourceManager HA Configure ResourceManager HA using Apache Ambari Identify Reasons to Add, Replace and Delete Worker Nodes Demonstrate How to Add a Worker Node Configure and Run the HDFS Balancer Decommission and Re-commission a Worker Node Describe the Process of Moving a Master Component Summarize the Purpose and Operation of Apache Ambari Metrics Describe the Features and Benefits of the Apache Ambari Dashboard Summarize the Purpose and Benefits of Apache Ambari Blueprints Recall the Process Used to Deploy a Cluster Using Ambari Blueprints Recall the Definition of an HDP Stack and Interpret its Version Number View the Current Stack and Identify Compatible Apache Ambari Software Versions Recall the Types of Methods and Upgrades Available in HDP Describe the Upgrade Process, Restrictions and Pre-upgrade Checklist Perform an Upgrade Using the Apache Ambari Web UI DAY 4 LABS Configuring NameNode HA Configuring Resource Manager HA Adding, Decommissioning and Re-commissioning a Worker Node Configuring Ambari Alerts Deploying an HDP Cluster Using Ambari Blueprints Performing an HDP Upgrade Express Revised 10/26/

22 TRAINING OFFERING HORTONWORKS DATA PLATFORM (HDP ) ADMINISTRATION FAST TRACK 5 DAYS FOUNDATION This 5 day training course is designed primarily for systems administrators and platform architects who need to understand HDP cluster capabilities, and manage HDP clusters. Topics include: Understanding HDF capabilities, Apache Hadoop, Apache YARN, HDFS, and other Hadoop ecosystem components. Students will understand how to administer, manage, and monitor HDP clusters. PREREQUISITES Students should be familiar with server or platform software concepts and have a basic understanding of system administration. TARGET AUDIENCE For students who range from understanding server software concepts to system administrators and platform architects who plan on administering HDP clusters. FORMAT 50% Lecture/Discussion 50% Hands-on Labs AGENDA SUMMARY Day 1: Introduction to Hadoop and Ambari Day 2: Managing HDFS, YARN Architecture and Management Day 3: The YARN Capacity Scheduler, High Availability, Monitoring and Backups Day 4: Advanced HDFS & YARN Services Day 5: Additional HDP Components and Tuning 21

23 DAY 1 OBJECTIVES Describe Apache Hadoop Summarize the Purpose of the Hortonworks Data Platform Software Frameworks List Hadoop Cluster Management Choices Describe Apache Ambari Identify Hadoop Cluster Deployment Options Plan for Hadoop Cluster Deployments Perform an Interactive HDP Installation using Apache Ambari Install Apache Ambari Describe the Differences Between Hadoop Users, Hadoop Service Owners and Ambari Users Manage Users, Groups and Permissions Identify Hadoop Configuration Files Summarize Operations of the Web UI Tool Manage Hadoop Service Configuration Properties using the Ambari Web UI Manage Client Configuration Files Using the Command-line Interface DAY 1 LABS Setting Up the Lab Environment Installing HDP Managing Apache Ambari Users and Groups Managing Hadoop Services 22

24 DAY 2 OBJECTIVES Describe the Hadoop Distributed File System (HDFS) Perform HDFS Shell Operations Use the Ambari Files View Use WebHDFS Protect Data using HDFS Access Control Lists (ACLs) Describe HDFS Architecture and Operation Manage HDFS using Ambari Web, NameNode and DataNode UIs Manage HDFS using Command-line Tools Enable and Manage HDFS Quotas Identify Reasons to Add, Replace and Delete Worker Nodes Configure and Run HDFS Balancer Decommission and Re-commission a Worker Node Move a Master Component Summarize the Purpose and Benefits of Rack Awareness Configure Rack Awareness DAY 2 LABS Using Hadoop Storage Using WebHDFS Using HDFS Access Control Lists Managing Hadoop Storage Managing HDFS Quotas Adding, Decommissioning, and Re-commissioning Worker Nodes Configuring Rack Awareness 23

25 DAY 3 OBJECTIVES Describe YARN Resource Management Summarize YARN Architecture and Operation Identify and Use YARN Management Options Summarize YARN Response to Component Failure Understand the Basics of Running a Sample YARN Application, Including: o MapReduce and Tez o Apache Pig o Apache Hive Summarize the Purpose and Operation of the YARN Capacity Scheduler Configure and Manager YARN Queues Control Access to YARN Queues DAY 3 LABS Managing the YARN Service Using the Apache Ambari Web UI Managing the YARN Service Using the CLI Commands Running Sample YARN Applications Setting Up for the Capacity Scheduler Managing YARN Containers and Queues Managing YARN ACLs and User Limits Working with YARN Node Labels 24

26 DAY 4 OBJECTIVES Summarize the Purpose of NameNode HA Configure NameNode HA using Ambari Summarize the Purpose of ResourceManager HA Configure ResourceManager HA using Ambari Summarize the Purpose and Operation of Ambari Metrics Describe Features and Benefits of the Ambari Dashboard Summarize the Purpose and Operation of Ambari Alerts Configure Ambari Alerts Summarize Hadoop Backup Considerations Enable and Manage HDFS Snapshots Copy Data Using DistCp Use Snapshots and DistCp Together Identify the Purpose and Operation of Heterogeneous HDFS Storage Identify HDFS NFS Gateway Use Cases Install and Configure an HDFS NFS Gateway Summarize the Purpose and Operation of HDFS Centralized Caching DAY 4 LABS Configuring NameNode High Availability Configuring ResourceManager High Availability Managing Apache Ambari Alerts Managing HDFS Snapshots Using DistCP Configuring HDFS Storage Policies Configuring an NFS Gateway Configuring HDFS Centralized Cache 25

27 DAY 5 OBJECTIVES Configure YARN Queues, Tez, and Hive Properties to Support Performance Goals Recall Basic Facts About Hive and the Hive Architecture Recall the Requirements and Benefits of Hive HA Summarize the Hive HA Architecture and Operation Configure and Test Hive HA Recall the Purpose, Job Types, Structure and Benefits of Oozie Install and Configure Oozie using Ambari Deploy and Manage a Sample Oozie Workflow Identify Characteristics of Ambari Local Versus LDAP Users and Groups Integrate Ambari Server with LDAP Summarize the Purpose and Benefits of Ambari Blueprints Recall the Process Used to Deploy a Cluster Using Ambari Blueprints Configure Ambari Blueprints Logical Cluster Configuration Files Recall the Definition of an HDP Stack and Interpret its Version Number View the Current Stack and Identify Compatible Ambari Software Versions Recall the Types and Methods of Upgrades Available in HDP Describe the Rolling Upgrade Process, Restrictions, and Pre-Upgrade Checklist Perform a Rolling Upgrade Using the Ambari Web UI DAY 5 LABS Configuring Apache Hive High Availability Managing Workflows Using Apache Oozie Integrating Apache Ambari with AD/LDAP Automating Cluster Provisioning using Apache Ambari Performing an HDP Upgrade Revised 02/27/

28 TRAINING OFFERING ADM-301 HDF NiFi FLOW MANAGEMENT 3 DAYS SUBJECT MATTER EXPERT This course is designed for Data Stewards or Data Flow Managers who are looking forward to automate the flow of data between systems. Topics Include Introduction to NiFi, Installing and Configuring NiFi, Detail explanation of NiFi User Interface, Explanation of its components and Elements associated with each. How to Build a dataflow, NiFi Expression Language, Understanding NiFi Clustering, Data Provenance, Security around NiFi, Monitoring Tools and HDF Best practices. PREREQUISITES Students should be familiar with programming principles and have previous experience in software development. Experience with Linux and a basic understanding of Dataflow tools would be helpful. No prior Hadoop/NiFi experience required, but is very helpful TARGET AUDIENCE Data Engineers, Integration Engineers and Architects who are looking to automate Data flow between systems. FORMAT 50% Lecture/Discussion 50% Hands-on Labs AGENDA SUMMARY Day 1: Introduction to HDF 3.0 (NiFi), Architecture and Features, NiFi UI, Process Groups etc. Day 2: Remote Processor Group, Attributes and Templates, Expression Language, Provenance, Clustering. Day 3: HDF and HDP, Security With ldap, SSL and Kerberos, File and Ranged Based Authorization etc. 27

29 DAY 1 OBJECTIVES Introduction to Enterprise Data Flow What s new in HDF 3.0 HDF 3.0 Architecture and Features HDF System Requirements Install and Configure HDF [NiFi] Describe NiFi User Interface in detail Describe NiFi UI Summary and History section Describe Anatomy of a Processor Describe Anatomy of a Connection Describe Controller Services and Reporting Tasks Learn How to Build a NiFi Data Flow Command and Control of a NiFi Data Flow Describe Anatomy of a Process Group DAY 1 LABS AND DEMONSTRATIONS Installing and Starting HDF with Ambari Demonstration: NiFi User Interface in detail Building a NiFi Dataflow Working with NiFi Process Groups About Hortonworks Hortonworks is a leading innovator at creating, distributing and supporting enterprise-ready open data platforms. Our mission is to manage the world s data. We have a single-minded focus on driving innovation in open source communities such as Apache Hadoop, NiFi, and Spark. Our open Connected Data Platforms power Modern Data Applications that deliver actionable intelligence from all data: data-in-motion and data-at-rest. Along with our partners, we provide the expertise, training and services that allows our customers to unlock the transformational value of data across any line of business. We are Powering the Future of Data. Contact For further information visit HORTON INTL: +44 (0) Hortonworks Inc. All Rights Reserved. Privacy Policy Terms of Service 28

30 DAY 2 OBJECTIVES Anatomy of a Remote Processor Group Remote Processor Group Transmission NiFi Site-to-Site Communication Describe the Function and Purpose of the NiFi Expression Language Structure of a NiFi Expression How to Use Expression Language Functions Using Expression Language Editor Using If/Then/Else in NiFi Expression Language Using Attributes and Properties Create, Manage and Instantiate a NiFi Templates How to optimize an HDF Data Flow Define Data Provenance and Data Provenance Events Describe NiFi Cluster and State Management Describe Cluster Setup and Management via NiFi UI Explain the Mechanisms Available for NiFi Monitoring DAY 2 LABS AND DEMONSTRATIONS Working with Remote Processor Groups Site-to Site Working with the NiFi Expression Language Demonstration: Working with Attributes Demonstration: Working with Templates Demonstration: Data Provenance Working with NiFi Clusters Demonstration: NiFi Notification Services Demonstration: NiFi Monitoring Advanced NiFi Monitoring About Hortonworks Hortonworks is a leading innovator at creating, distributing and supporting enterprise-ready open data platforms. Our mission is to manage the world s data. We have a single-minded focus on driving innovation in open source communities such as Apache Hadoop, NiFi, and Spark. Our open Connected Data Platforms power Modern Data Applications that deliver actionable intelligence from all data: data-in-motion and data-at-rest. Along with our partners, we provide the expertise, training and services that allows our customers to unlock the transformational value of data across any line of business. We are Powering the Future of Data. Contact For further information visit HORTON INTL: +44 (0) Hortonworks Inc. All Rights Reserved. Privacy Policy Terms of Service 29

31 DAY 3 OBJECTIVES Describe How HDF Complements the Hortonworks Data Platform (HDF) Describe how Big Data Ingestion is possible with HDF Describe HDF Configuration Best Practices Describe the Process of Securing HDF with 2-Way-SSL Describe LDAP User Authentication with NiFi Describe Kerberos Authentication with NiFi Describe HDF Multi-tenancy Describe how File Based Authorizer in NiFi works Describe how Ranger Based Authorizer in NiFi works Describe the Architecture of Authorization Via the Ranger-NiFi Plug-in List the Installation Prerequisites, configure and Install Ranger Describe how to create Ranger policies for NiFi DAY 3 LABS Integrating HDF with HDP Securing HDF with 2-way SSL Using Ambari NiFi User Authentication with LDAP Installing Ranger and Configuring NiFi with Kerberos Working with File Based Authorization in NiFi Working with Ranger Based Authorization in NiFi Revised 10/19/2017 About Hortonworks Hortonworks is a leading innovator at creating, distributing and supporting enterprise-ready open data platforms. Our mission is to manage the world s data. We have a single-minded focus on driving innovation in open source communities such as Apache Hadoop, NiFi, and Spark. Our open Connected Data Platforms power Modern Data Applications that deliver actionable intelligence from all data: data-in-motion and data-at-rest. Along with our partners, we provide the expertise, training and services that allows our customers to unlock the transformational value of data across any line of business. We are Powering the Future of Data. Contact For further information visit HORTON INTL: +44 (0) Hortonworks Inc. All Rights Reserved. Privacy Policy Terms of Service 30

32 TRAINING OFFERING ADM-303 HORTONWORKS DATA PLATFORM (HDP ) OPERATIONS: ADMINISTRATION 2 3 DAYS This course is designed for experienced administrators who manage Hortonworks Data Platform (HDP) 2.3 clusters with Ambari. It covers upgrades, configuration, application management, and other common tasks. PREREQUISITES Attendees should have attended HDP Operations: Hadoop Administration 1 or possess equivalent knowledge and experience. Attendees should be familiar with basic HDP administration and Linux environments. TARGET AUDIENCE IT administrators and operators responsible for configuring, managing, and supporting an Hadoop 2.3 deployment in a Linux environment using Ambari. FORMAT 50% Lecture/Discussion 50% Hands-on Labs AGENDA SUMMARY Day 1: Performing a Rolling Upgrade, Configuring Heterogeneous HDFS Storage Day 2: Deploying Applications with Slider, Integrating Ambari with LDAP and Hive Tuning Day 3: Apache Oozie High Availability, Introduction to Falcon, Automating Using Ambari Blueprints 31

33 DAY 1 OBJECTIVES List the HDP Upgrade Types List the HDP Upgrade Path Restrictions Preparing Databases and HDFS Describe the Process Used to Register a New HDP Version Execute Automated Installation of and Upgrades to HDP Clusters Identify the Purpose and Operation of Hetergenous HDFS Storage Identify and Describe the HDFS Storage Types and Policies Identify HDFS NFS Gateway Use Cases Recall HDFS NFS Gateway Architecture and Operation Install and Configure an HDFS NFS Gateway Configure an HDFS NFS Gateway Client Summarize the Purpose and Operation of HDFS Centralized Caching Configure HDFS Centralized Cache Define and Manage Cache Pools and Cache Directives Summarize the Importance of File Format and Compression Algorithm Selection Summarize the Benefits and Considerations When Using Compression in Hadoop Describe the Administrator and Non-Administrator Roles in Managing Compression List and Describe Splittable Compression Formats Configure Default MapReduce and Tez Compression Algorithms DAY 1 LABS Setting Up the Lab Environment Performing a Rolling Upgrade Configuring HDFS Storage Policies Configuring an NFS Gateway Configuring HDFS Centralized Cache 32

34 DAY 2 OBJECTIVES Summarize the Purpose and Operation of YARN Node Labels Create, Add, Modify and Remove Node Labels Configure Queues to Access Node Label Resources Run Test Jobs to Confirm Node Label Behavior Recall the Purpose, Benefits and Components of Apache Slider Install and Manage an Apache Slider Application Package Using the Slider View Identify Characteristics of Ambari Local Versus LDAP Users and Groups Integrate Ambari Server with LDAP Configure YARN Queues, Tez and Hive Properties to Support Performance Goals Recall Basic Facts About Hive and the Hive Architecture Recall the Requirements and Benefits of Hive High Availability HA Summarize the Hive HA Architecture and Operation Configure and Test Hive HA Recall the Purpose, Job Types, Structure and Benefits of Apache Oozie Install and Configure Apache Oozie Using Ambari Deploy and Manage a Sample Apache Oozie Workflow DAY 2 LABS Configuring YARN Node Labels Deploying Applications Using Apache Slider Integrating Ambari with LDAP Configuring Hive High Availability Managing Workflows Using Apache Oozie 33

35 DAY 3 OBJECTIVES Recall the Purpose and Architecture of Apache Oozie Recall the Benefits of Apache Oozie High Availability (HA) Summarize the Apache Oozie HA Architecture and Operation Configure Apache Oozie HA Understand the Challenges of Data Governance in Large, Complex Environments Recall the Purpose and Capabilities of Falcon Understand the Purpose and Configuration of Cluster, Feed and Process Entities Create a Cluster Entity and Set Up Mirroring Using the Falcon UI Summarize the Purpose and Benefits of Ambari Blueprints Recall the Processes Used to Deploy a Cluster Using Ambari Blueprints Configure Ambari Blueprints Logical Cluster Configuration Flies Configure Ambari Blueprints Cluster Creation Configuration Files Configure Ambari Blueprints Host Creation Configuration Files Identify Ambari Blueprints Configuration Property Precedence and Best Practices DAY 3 LABS Configuring Apache Oozie High Availability Configuring Data Replication Using Apache Falcon Automating Cluster Provisioning Using Apache Ambari Revised 10/06/

36 TRAINING OFFERING ADM-351 HDP HADOOP SECURITY 3 DAYS SUBJECT MATTER EXPERT This course is designed for experienced administrators who will be implementing secure Hadoop clusters using authentication, authorization, auditing and data protection strategies and tools. PREREQUISITES Students should be experienced in the management of Hadoop using Ambari and Linux environments. Completion of the following course is required before taking HDP Hadoop Security: ADM-221 HDP Hadoop Administration 1 Foundations TARGET AUDIENCE IT administrators and operators responsible for installing, configuring and supporting an Apache Hadoop deployment for a secure environment. FORMAT 50% Lecture/Discussion 50% Hands-on Labs LAB FORMAT The Labs are setup to accommodate the customer's Kerberos KDC preference. The initial version of the labs utilized " Active Directory as the Kerberos KDC". However, customer are now wanting to deploy using " MIT KDC as the Kerberos KDC with Cross Realm Trust to Active Directory". This course now provides both environments to allow the customer to choose the labs that best fits their Kerberos KDC choice. Lab Tracks HDP 2.6 using Active Directory HDP 2.6 using MIT KDC 35

37 AGENDA SUMMARY Day 1: Defining Security, Securing Sensitive Data, Integrating HDP Security and HDP Security Prerequisites Day 2: Enabling Kerberos and Installing Apache Ranger Day 3: Secure Access with Ranger and an Apache Knox Overview and Installation DAY 1 OBJECTIVES Define Security Describe the 5 pillars of a secure environment List the reasons why a secure environment is needed Describe HDP Security List Security Implications Describe the Typical Flow of Security List Security Prerequisites Describe Authorization with Apache Ranger Describe the Use and Purpose of HDFS Encryption Describe the Use and Purpose of Wire Encryption Choose which security tool is best for specific use cases Describe the Purpose of the Apache Knox Gateway Describe the Process for Encryption with Apache Ranger KMS Describe the Purpose and Implementation Options of the Kerberos KBC Configure Ambari security Describe How to Encrypt Database and LDAP Passwords Describe How to Set Up SSL for the Ambari Server Describe How to Set Up Two-Way SSL Between Ambari Server/Agents Set up Ambari Views for controlled access Describe Kerberos use and architecture Install Kerberos Describe available partner security solutions 36

38 DAY 1 LABS Setting up the Lab Environment Configuring the AD Resolution Certificate Security Options for Ambari 37

39 DAY 2 OBJECTIVES Configure Ambari for Kerberos Configure Hadoop for Kerberos Enable Kerberos Describe the Purpose of Apache Ranger Describe the Apache Ranger Architecture List the Prerequisites for Apache Ranger Describe the Purpose of the Apache Ranger REST API List the Optional Apache Ranger Configurations Install and Configure Apache Knox Install and Configure Apache Ranger Install and Configure Ranger Key Management Services (KMS) Describe HDFS Encryption with Ranger KMS Describe the Purpose of the HDFS Encryption Zone DAY 2 LABS Kerberizing the Cluster Installing Apache Ranger Setting up Apache Ranger KMS Data Encryption 38

40 DAY 3 OBJECTIVES Describe the Process for Integrating the Apache Ranger Plugin with: o HDFS o Hive o HBase o Knox o Storm Describe the Function and Purpose of Apache Knox Describe the Apache Knox Architecture Install and Configure Apache Knox Describe the Purpose of Ambari Views Setup a Standalone Ambari View Server Configure Ambari Views Server for Kerberos Setup Kerberos for: o Files View o Tez View o Pig View o Hive View DAY 3 LABS Secured Hadoop Exercises Configuring Apache Knox Exploring Other Security Features of Apache Ambari Revised 10/26/

41 TRAINING OFFERING ADM-305 HORTONWORKS DATA PLATFORM (HDP ) OPERATIONS: APACHE HBASE ADVANCED MANAGEMENT 4 DAYS This course is designed for administrators who will be installing, configuring and managing HBase clusters. It covers installation with Ambari, configuration, security and troubleshooting HBase implementations. The course includes an end-of-course project in which students work together to design and implement an HBase schema. PREREQUISITES Students must have basic familiarity with data management systems. Familiarity with Hadoop or databases is helpful but not required. Students new to Hadoop are encouraged to take the HDP Overview: Apache Hadoop Essentials course. TARGET AUDIENCE Architects, software developers, and analysts responsible for implementing non-sql databases in order to handle sparse datasets commonly found in big data use cases. FORMAT 50% Lecture/Discussion 50% Hands-on Labs AGENDA SUMMARY Day 1: An Apache HBase Overview and Installing HBase Day 2: Using the HBase Shell and Ingest/ImportTSV Day 3: Managing HA Clusters and Log Files, Backup Recovery and Security Day 4: Monitoring HBase, Maintenance, Troubleshooting and Class Project 40

42 DAY 1 OBJECTIVES Describe the Characteristics and Operation of HDFS Describe the Responsibilities of the NameNode and DataNode Describe the Purpose of YARN, including the: o ResourceManager o NodeManager o ApplicationMaster Describe the Primary Differences Between Hadoop 1.x and 2.x Describe the Function and Purpose of HBase List HBase Features and Components Describe an HBase Table as a Set of Key-Value Mappings Idenfity HBase as Either a Row-or- Column-Oriented Database Describe HBase Operations List the Options for HBase Installation List the HBase Minimum System Requirements Describe the Process for Installing HBase Using Ambari Describe the Process for Confirming a Successful Installation DAY 1 LABS Installing and Configuring HBase with Ambari Manually Installing an HBase Cluster 41

43 DAY 2 OBJECTIVES Work with Basic HBase Shell Commands List the Categories of Shell Commands Including: o General o Table Management o Data Manipulation o Surgery Tools o Cluster Replication Tools o Security Tools Work with Cluster Administration Commands Describe the Function and Purpose of the Regionserver Identify the Purpose of Key-Value Pairs Identify the Purpose of Row Keys Identify the Purpose of Column Families Describe How to Read and Write Data in HBase Describe the Flush Process Describe the Compaction Process Perform a Bulk Ingest Using ImportTSV Describe the Function and Purpose of a CopyTable DAY 2 LABS Using HBase Shell Commands Ingesting Data with ImportTSV 42

44 DAY 3 OBJECTIVES List the Steps Required to Upgrade HBase Configure HBase for High Availability View Log Files Describe the Function and Purpose of HBase Coprocessors Describe the Function and Purpose of HBase Filters Describe the Process for Using Filters for Scans Describe the Process for Protecting HBase Data with Backups Describe the Function and Benefits of Snapshots in HBase Describe the Process for Performing Snapshots in HBase Describe the Process for HBase Replication Configure HBase Cluster Replication Describe the Purpose of HBase Authentication Describe the Purpose and Benefits of HBase Authorization Via ACLs Describe the Benefits of Ranger and Knox for HBase Security Describe the Process Used to Configure Simple Authentication Describe the Secure Bulk Load Process DAY 3 LABS Enabling HBase High Availability Viewing Log Files Configuring and Enabling Snapshots Configuring Cluster Replication Enabling Authentication and Authorization in HBase Tables 43

45 DAY 4 OBJECTIVES List Important Metrics to Monitor for an HBase Cluster Monitor an HBase Cluster Using Ambari Describe the Benefits of OpenTSDB as a Took for Monitoring Describe How to Identify a Region Hot Spot Design a Row-Key Schema to Avoid Hot Spotting Configure an HBase Table Using Pre-Splitting Describe the Region Splitting Process Describe the Function of the Load Balancer Define Region Sizing Describe the Process of Manual Splitting and Merging Describe the Process of Resolving Regions Overlap Issues Use the Zookeeper Command Line Tool to Check Zookeeper Status and State Monitor JVM Garbage Collection Metrics on Regionservers Resolve Startup Errors for Masterserver and Regionservers Tune HBase for Better Performance Tune HDFS for Better HBase Performance DAY 4 LABS Diagnosing and Resolving Hot Spotting Region Splitting Monitoring JVM Garbage Collection End of Course Lab Project Designing an HBase Schema Revised 02/27/

46 TRAINING OFFERING DEV-201 HORTONWORKS DATA PLATFORM (HDP ) DEVELOPER QUICK START 4 DAYS This 4 day training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Apache Pig and Apache Hive, and developing applications on Apache Spark. Topics include: Essential understanding of HDP & its capabilities, Hadoop, YARN, HDFS, MapReduce/Tez, data ingestion, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core, Spark SQL, Apache Zeppelin, and additional Spark features. PREREQUISITES Students should be familiar with programming principles and have experience in software development. SQL and light scripting knowledge is also helpful. No prior Hadoop knowledge is required. TARGET AUDIENCE Developers and data engineers who need to understand and develop applications on HDP FORMAT 50% Lecture/Discussion 50% Hands-0n Labs AGENDA SUMMARY Day 1: HDP Essentials and an Introduction to Pig Day 2: Apache Hive Day 3: Apache Spark Day 4: Apache Spark Continued 45

47 DAY 1 OBJECTIVES Describe the Case for Hadoop Describe the Trends of Volume, Velocity and Variety Discuss the Importance of Open Enterprise Hadoop Describe the Hadoop Ecosystem Frameworks Across the Following Five Architectural Categories: o Data Management o Data Access o Data Governance & Integration o Security o Operations Describe the Function and Purpose of the Hadoop Distributed File System (HDFS) List the Major Architectural Components of HDFS and their Interactions Describe Data Ingestion Describe Batch/Bulk Ingestion Options Describe the Streaming Framework Alternatives Describe the Purpose and Function of MapReduce Describe the Purpose and Components of YARN Describe the Major Architectural Components of YARN and their Interactions Define the Purpose and Function of Apache Pig Work with the Grunt Shell Work with Pig Latin Relation Names and Field Names Describe the Pig Data Types and Schema DAY 1 LABS AND DEMONSTRATIONS Starting an HDP Cluster Using HDFS Commands Demonstration: Understanding Apache Pig Getting Started with Apache Pig Exploring Data with Pig 46

48 DAY 2 OBJECTIVES Demonstrate Common Operators Such as: o ORDER BY o CASE o DISTINCT o PARALLEL o FOREACH Understand how Hive Tables are Defined and Implemented Use Hive to Explore and Analyze Data Sets Explain and Use the Various Hive File Formats Create and Populate a Hive Table that Uses ORC File Formats Use Hive to Run SQL-like Queries to Perform Data Analysis Use Hive to Join Datasets Using a Variety of Techniques Write Efficient Hive Queries Explain the Uses and Purpose of HCatalog Use HCatalog with Pig and Hive DAY 2 LABS AND DEMONSTRATIONS Splitting a Dataset Joining Datasets Preparing Data for Apache Hive Understanding Apache Hive Tables Demonstration: Understanding Partitions and Skew Analyzing Big Data with Apache Hive Demonstration: Computing Ngrams Joining Datasets in Apache Hive Computing Ngrams of s in Avro Format Using HCatalog with Apache Pig 47

49 DAY 3 OBJECTIVES Describe How to Perform a Multi-Table/File Insert Define and Use Views Define and Use Clauses and Windows List the Hive File Formats Including: o Text Files o SequenceFile o RCFile o ORC File Define Hive Optimization Use Apache Zeppelin to Work with Spark Describe the Purpose and Benefits of Spark Define Spark REPLs and Application Architecture Explain the Purpose and Function of RDDs Explain Spark Programming Basics Define and Use Basic Spark Transformations Define and Use Basic Spark Actions Invoke Functions for Multiple RDDs, Create Named Functions and Use Numeric Operations DAY 3 LABS Advanced Apache Hive Programming Introduction to Apache Spark REPLs and Apache Zeppelin Creating and Manipulating RDDs Creating and Manipulating Pair RDDs 48

50 DAY 4 OBJECTIVES Define and Create Pair RDDs Perform Common Operations on Pair RDDs Name the Various Components of Spark SQL and Explain their Purpose Describe the Relationship Between DataFrames, Tables and Contexts Use Various Methods to Create and Save DataFrames and Tables Understand Caching, Persisting and the Different Storage Levels Describe and Implement Checkpointing Create an Application to Submit to the Cluster Describe Client vs Cluster Submission with YARN Submit an Application to the Cluster List and Set Important Configuration Items DAY 4 LABS Creating and Saving DateFrames and Tables Working with DataFrames Building and Submitting Applications to YARN Revised 10/06/

51 TRAINING OFFERING DEV-203 HORTONWORKS DATA PLATFORM (HDP ) DEVELOPER: APACHE PIG AND HIVE 4 DAYS This 4 day training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive. Topics include: Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core and Spark SQL. PREREQUISITES Students should be familiar with programming principles and have experience in software development. SQL knowledge is also helpful. No prior Hadoop knowledge is required. TARGET AUDIENCE Software developers who need to understand and develop applications for Hadoop. FORMAT 50% Lecture/Discussion 50% Hands-0n Labs AGENDA SUMMARY Day 1: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Day 2: Pig Programming Day 3: Hive Programming Day 4: Advanced Hive Programming, Hadoop 2 and YARN, Introduction to Spark Core 50

52 DAY 1 OBJECTIVES List the Three V s of Big Data List the Six Key Hadoop Data Types Describe Hadoop, YARN and Use Cases for Hadoop Describe Hadoop Ecosystem Tools and Frameworks Describe the Differences Between Relational Databases and Hadoop Describe What is New in Hadoop 2.x Describe the Hadoop Distributed File System (HDFS) Describe the Differences Between HDFS and an RDBMS Describe the Purpose of NameNodes and DataNodes List Common HDFS Commands Describe HDFS File Permissions List Options for Data Input Describe WebHDFS Describe the Purpose of Sqoop and Flume Describe How to Export to a Table Describe the Purpose of MapReduce Define Key/Value Pairs in MapReduce Describe the Map and Reduce Phases Describe Hadoop Streaming DAY 1 LABS AND DEMONSTRATIONS Starting an HDP Cluster Demonstration: Understanding Block Storage Using HDFS Commands Importing RDBMS Data into HDFS Exporting HDFS Data to an RDBMS Importing Log Data into HDFS Using Flume Demonstration: Understanding MapReduce Running a MapReduce Job 51

53 DAY 2 OBJECTIVES Describe the Purpose of Apache Pig Describe the Purpose of Pig Latin Demonstrate the Use of the Grunt Shell List Pig Latin Relation Names and Field Names List Pig Data Types Define a Schema Describe the Purpose of the GROUP Operator Describe Common Pig Operators, Including o ORDER BY o CASE o DISTINCT o PARALLEL o FLATTEN o FOREACH Perform an Inner, Outer and Replicated Join Describe the Purpose of the DataFu Library DAY 2 LABS AND DEMONSTRATIONS Demonstration: Understanding Apache Pig Getting Starting with Apache Pig Exploring Data with Apache Pig Splitting a Dataset Joining Datasets with Apache Pig Preparing Data for Apache Hive Demonstration: Computing Page Rank Analyzing Clickstream Data Analyzing Stock Market Data Using Quantiles 52

54 DAY 3 OBJECTIVES Describe the Purpose of Apache Hive Describe the Differences Between Apache Hive and SQL Describe the Apache Hive Architecture Demonstrate How to Submit Hive Queries Describe How to Define Tables Describe How to Load Date Into Hive Define Hive Partitions, Buckets and Skew Describe How to Sort Data List Hive Join Strategies Describe the Purpose of HCatalog Describe the HCatalog Ecosystem Define a New Schema Demonstrate the Use of HCatLoader and HCatStorer with Apache Pig Perform a Multi-table/File Insert Describe the Purpose of Views Describe the Purpose of the OVER Clause Describe the Purpose of Windows List Hive Analytics Functions List Hive File Formats Describe the Purpose of Hive SerDe DAY 3 LABS AND DEMONSTRATIONS Understanding Hive Tables Understanding Partition and Skew Analyzing Big Data with Apache Hive Demonstration: Computing NGrams Joining Datasets in Apache Hive Computing NGrams of s in Avro Format Using HCatalog with Apache Pig 53

55 DAY 4 OBJECTIVES Describe the Purpose HDFS Federation Describe the Purpose of HDFS High Availability (HA) Describe the Purpose of the Quorum Journal Manager Demonstrate How to Configure Automatic Failover Describe the Purpose of YARN List the Components of YARN Describe the Lifecycle of a YARN Application Describe the Purpose of a Cluster View Describe the Purpose of Apache Slider Describe the Origin and Purpose of Apache Spark List Common Spark Use Cases Describe the Differences Between Apache Spark and MapReduce Demonstrate the Use of the Spark Shell Describe the Purpose of an Resilient Distributed Dateset (RDD) Demonstrate How to Load Data and Perform a Word Count Define Lazy Evaluation Describe How to Load Multiple Types of Data Demonstrate How to Perform SQL Queries Demonstrate How to Perform DataFrame Operations Describe the Purpose of the Optimization Engine Describe the Purpose of Apache Oozie Describe Apache Pig Actions Describe Apache Hive Actions Describe MapReduce Actions Describe How to Submit an Apache Oozie Workflow Define an Oozie Coordinator Job 54

56 DAY 4 LABS Advanced Apache Hive Programming Running a YARN Application Getting Started with Apache Spark Exploring Apache Spark SQL Defining an Apache Oozie Workflow Revised 10/06/

57 TRAINING OFFERING DEV-301 HORTONWORKS DATA PLATFORM (HDP ) DEVELOPER: JAVA 4 DAYS This course provides Java programmers a deep-dive into Hadoop application development. Students will learn how to design and develop efficient and effective MapReduce applications for Hadoop using the Hortonworks Data Platform, including how to implement combiners, partitioners, secondary sorts, custom input and output formats, joining large datasets, unit testing, and developing UDFs for Pig and Hive. Labs are run on a 7-node HDP 2.1 cluster running in a virtual machine that students can keep for use after the training. PREREQUISITES Students must have experience developing Java applications and using a Java IDE. Labs are completed using the Eclipse IDE and Gradle. No prior Hadoop knowledge is required. TARGET AUDIENCE Experienced Java software engineers who need to develop Java MapReduce applications for Hadoop. FORMAT 50% Lecture/Discussion 50% Hands-0n Labs AGENDA SUMMARY Day 1: Understanding Hadoop, the Hadoop Distributed File System (HDFS) and Map Reduce Day 2: Partitioning, Sorting and Input/Output Formats Day 3: Optimizing MapReduce Jobs, Advanced MapReduce Features and HBase Programming Day 4: Pig and Hive Programming, Defining Workflows 56

58 DAY 1 OBJECTIVES Describe Hadoop 2.X and the Hadoop Distribute File System Describe the YARN framework Describe the Purpose of NameNodes and Data Nodes Describe the Purpose of HDFS High Availability (HA) Describe the Purpose of the Quorum Journal Manager List Common HDFS Commands Describe the Purpose of YARN List Open-Source YARN Use Cases List the Components of YARN Describe the Life Cycle of a YARN Application Define Map Aggregation Describe the Purpose of Combiners Describe the Purpose of In-Map Aggregation Describe the Purpose of Counters Describe the Purpose of User-Defined Counters DAY 1 LABS AND DEMONSTRATIONS Demonstration: Understanding Block Storage Configuring a Hadoop Development Environment Putting Files in HDFS with Java Demonstration: Understanding Map Reduce Word Count Distributed Grep Inverted Index Using a Combiner Computing an Average 57

59 DAY 2 OBJECTIVES Describe the Purpose of a Partitioner List the Steps for Writing a Custom Partitioner Describe How to Create and Distribute a Partition File Describe the Purpose of Sorting Describe the Purpose of Custom Keys Describe How to Write a Group Comparator List the Built-In Input Formats Describe the Purpose of Input Formats Define a Record Reader Describe How to Handle Records that Span Splits List the Built-In Output Formats Describe How to Write a Custom Output Format Describe the Purpose of the MultipleOutputs Class DAY 2 LABS AND DEMONSTRATIONS Writing a Custom Partitioner Using TotalOrderPartitioner Custom Sorting Demonstration: Combining Input Files Processing Multiple Inputs Writing a Custom Input Format Customizing Output Working with a Simple Moving Average 58

60 DAY 3 OBJECTIVES List Optimization Best Practices Describe How to Optimize the Map and Reduce Phases Describe the Benefits of Data Compression Describe the Limits of Data Compression Describe the Configuration of Data Compression Describe the Purpose of a RawComparator Describe the Purpose of Localization List Scenarios for Performing Joins in MapReduce Describe the Purpose of the Bloom Filter Describe the Purpose of MRUnit and the MRUnit API Describe How to Set Up a Test Describe How to Test a Mapper Describe How to Test a Reducer Describe the Purpose of HBase Define the Differences Between a Relational Database and HBase Describe the HBase Architecture Demonstrate the Basics of HBase Programming Describe an HBase MapReduce Applications DAY 3 LABS Using Data Compression Defining a RawComparator Performing a Map-Side Join Using a Bloom Filter Unit Testing a MapReduce Job Importing Data to HBase Creating an HBase Mapreduce Job 59

61 DAY 4 OBJECTIVES Describe the Purpose of Apache Pig and Pig Latin Demonstrate the Use of the Grunt Shell List the Common Pig Data Types Describe the Purpose of the FOREACH GENERATE Operator Describe the Purpose of Pig User Defined Functions (UDFs) Describe the Purpose of Filter Functions Describe the Purpose of Accumulator UDFs Describe the Purpose of Algebraic Functions Describe the Purpose of Apache Hive Describe the Differences Between Apache Hive and SQL Describe Apache Hive Architecture Describe How to Load Data Into Hive Demonstrate How to Perform Queries Describe the Purpose of Hive User Defined Functions (UDFs) Write a Hive UDF Describe the Purpose of HCatalog Describe the Purpose of Apache Oozie Describe How to Define an Oozie Workflow Describe Pig and Hive Actions Describe How to Define an Oozie Coordinator Job DAY 4 LABS AND DEMONSTRATIONS Demonstration: Understanding Pig Writing a Pig UDF Writing a Pig Accumulator Writing a Apache Hive UDF Defining an Oozie Workflow Working with TF-IDF and the JobControl Class Revised 10/06/

62 TRAINING OFFERING DEV-303 HORTONWORKS DATA PLATFORM (HDP ) ANALYST: HBASE ESSENTIALS 2 DAYS This course is designed for big data analysts who want to use the HBase NoSQL database which runs on top of HDFS to provide real-time read/write access to sparse datasets. Topics include HBase architecture, services, installation and schema design. PREREQUISITES Students must have basic familiarity with data management systems. Familiarity with Hadoop or databases is helpful but not required. Students new to Hadoop are encouraged to attend the HDP Overview: Apache Hadoop Essentials course. TARGET AUDIENCE Architects, software developers, and analysts responsible for implementing non-sql databases in order to handle sparse datasets commonly found in big data use cases. FORMAT 50% Lecture/Discussion 50% Hands-0n Labs AGENDA SUMMARY Day 1: A Hadoop Primer and HBase Overview Day 2: HBase Command Line Basics, HBase Installation and Configuration and HBase Schema Design 61

63 DAY 1 OBJECTIVES Distinguish between Hadoop and the Hortonworks Data Platform Identify that Hadoop is Comprised of Multiple Apache Projects Describe How Hadoop Stores Files and Processes Data Describe the Hadoop Distributed File System (HDFS) Describe the Reason HBase was Created List HBase Features List the Components of the HBase Architecture Describe an HBase Table as a set of Value Mappings Identify HBase as Either a Row or Column Oriented Database Describe the Features Available in HBase 1.0 Describe a High-Level View of the Overall HBase Architectural Design Discuss HBase Region Design and Implementation List HBase Services Use HBase Data Operations Describe HBase High (HA) Availability Options List HBase Operational Commands Outline an HBase Query Operation DAY 1 LABS Running a MapReduce Job Using HBase Importing Tables from MySQL Into HBase Working with Apache ZooKeeper Examining HBase Configuration Files 62

64 DAY 2 OBJECTIVES List the Shell Command Line Categories Use General Shell Commands Use Data Manipulations Shell Commands Use Surgery Tools Use Cluster Replication Tools Describe General Considerations for Installation and Configuration Describe HBase Configuration Requirements Describe Apache ZooKeeper Configuration Requirements Backup HBase Tables and Metadata Identify How to Choose an Appropriate Rowkey Describe the HBase Data Model Discuss Sample Schema Use Cases Describe How to Optimize Block Size Describe How to Adjust Cache Size Use Bloom Filters Discuss General Optimization Methods DAY 2 LABS AND DEMONSTRATIONS Using HBase Shell Commands Performing a Backup and Using Snapshot Exporting with Apache Pig and Importing with ImportTsv Setting Block Size and Enabling Bloom Filters Demonstration: Using Java Data Access Object Revised 10/06/

65 TRAINING OFFERING DEV-305 HORTONWORKS DATA PLATFORM (HDP ) DEVELOPER: APACHE STORM AND TRIDENT 2 DAYS This course provides a technical introduction to the fundamentals of Apache Storm and Trident that includes the concepts, terminology, architecture, installation, operation, and management of Storm and Trident. Simple Storm and Trident code excerpts are provided throughout the course. The course also includes an introduction to, and code samples for, Apache Kafka. Apache Kafka is a messaging system that is commonly used in concert with Storm and Trident. PREREQUISITES Students must have experience developing Java applications and using a Java IDE. Labs are completed using the Eclipse IDE and Gradle. Students should have a basic understanding of Hadoop. TARGET AUDIENCE Hadoop developers who need to be able to design and build Storm and Kafka applications using Java and the Trident API. FORMAT 50% Lecture/Discussion 50% Hands-0n Labs AGENDA SUMMARY Day 1: Real-Time Data Processing, Introduction to Storm Components, Installing and Configuring Storm Day 2: Storm Management, Kafka Programming, An Introduction to Trident 64

66 DAY 1 OBJECTIVES Identify Whether Storm Performs Batch or Real-Time Processing Recognize Differences Between Batch and Real-Time Processing List Reasons Why Companies Deploy Storm Describe Storm Use Cases Define the Terms Tuple, Stream, Topology, Spout, Bolt, Nimbus and Supervisor Diagram the Relationship Between a Supervisor, Worker Process, Executor and a Task Given the Java Code for a Topology, Diagram the Spout and Bolt Connections Define the Purpose of a Stream Grouping Perform a Storm Installation Using the Hortonworks Data Platform and Ambari Given a List of Storm Configuration Sources, Order them By Precedence Identify the Primary, Installation Specific Storm Configuration Files Identify the URL Useful for Reading Storm Configuration Parameter Descriptions List Differences Between Storm Local Mode and Distributed Mode Identify Reasons to Use Storm Local Mode Given a JAR File Name and the Package Name of a Topology, Build the Storm Command Necessary to Submit the Topology to the Cluster Given a Topology Code Example, Describe the Spout and Bolt Connections in the Topology Identify the Purpose of the Muli-lang Protocol Identify the Differences Between Reliable and Unreliable Operation Diagram a Tuple Tree and Identify Its Branches List Three Methods to Disable Reliable Operation DAY 1 LABS Configuring a Storm Development Environment Storm Word Count Using Storm Multi-lang Support Processing Log Files 65

67 DAY 2 OBJECTIVES List Tool Used to Manage and Monitor Storm Display Online Help Using the Storm Command Line Client Determine when it is Appropriate to Use the Storm List, Activate, Deactivate, Rebalance and Kill Commands Identify How to Open the Storm UI Console Interpret the Metrics Displayed on the Storm UI Console Recognize Use Cases for Kafka Describe the Components of Kafka Explain the Concept of a Topic Leader and Followers Describe the Publication and Consumption of Kafka Messages Define a New Kafka Topic Configure and Instantiate a Kafka Spout for a Storm and Trident Topology List Differences Between Storm and Trident List Characteristics of a Trident Topology List the Benefits of Batch Processing Describe the Purpose and Operation of the Each Method Describe the Purpose and Operation of a Trident Filter Describe the Types of Aggregation Operations List the Three Types of Trident States DAY 2 LABS Integrating Kafka with Storm Using Trident Using Trident with Kafka Revised 10/06/

68 TRAINING OFFERING DEV-307 HORTONWORKS DATA PLATFORM (HDP ) DEVELOPER: CUSTOM APACHE YARN APPLICATIONS 2 DAYS This course is designed for developers who want to create custom YARN applications for Apache Hadoop. It will include: the YARN architecture, YARN development steps, writing a YARN client and ApplicationMaster, and launching Containers. The course uses Eclipse and Gradle connected remotely to a 7-node HDP cluster running in a virtual machine. PREREQUISITES Students should be experienced Java developers who have attended HDP Developer: Java OR HDP Developer: Pig and Hive OR are experienced with Hadoop and MapReduce development. TARGET AUDIENCE Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Hadoop. FORMAT 50% Lecture/Discussion 50% Hands-0n Labs AGENDA SUMMARY Day 1: Introduction to YARN Day 2: Writing a YARN Application Master and Working with Containers and Job Scheduling 67

69 DAY 1 OBJECTIVES Describe the YARN architecture Describe the Purpose of the ResourceManager Describe the Purpose of the YARN Scheduler Describe the Purpose of the Application Manager Describe the Purpose of the ApplicationMaster Launcher Describe the Purpose of the ContainerAllocation Expirer Describe the Purpose of the NodeManager Describe the Purpose of the ApplicationMaster Describe the Purpose of Containers Describe the Life Cycle of a YARN Application Describe the YARN API List the Local Resource Types List the Requirements of a YARN Client Run a YARN Application on a Hadoop Cluster Monitor the Status of a Running YARN Application Configure a ContainerLaunchContext Use a LocalResource to share application files across a cluster DAY 1 LABS Running a YARN Application Setting up a YARN Development Environment Writing a YARN Client Submitting an ApplicationMaster 68

70 DAY 2 OBJECTIVES Write a YARN ApplicationMaster Describe the differences between synchronous and asynchronous ApplicationMasters Allocate Containers in a cluster Launch Containers on NodeManagers Describe the Purpose of Container Tokens Write a custom Container to perform specific business logic Explain the job schedulers of the ResourceManager Describe the Purpose of the Capacity Scheduler Describe How to Configure the Capacity Scheduler Describe How to Configure Capacity Limits Describe How to Configure Queue Permissions Define queues for the Capacity Scheduler Describe the Purpose of Fair Scheduling DAY 2 LABS Writing an Application Master Requesting Containers Writing Custom Containers Putting it All Together Revised 10/06/

71 TRAINING OFFERING DEV-343 SPARK DEVELOPER 4 DAYS SUBJECT MATTER EXPERT This course introduces the Apache Spark distributed computing engine, and is suitable for developers, data analysts, architects, technical managers, and anyone who needs to use Spark in a hands-on manner. It is based on the Spark 2.x release. The course provides a solid technical introduction to the Spark architecture and how Spark works. It covers the basic building blocks of Spark (e.g. RDDs and the distributed compute engine), as well as higher-level constructs that provide a simpler and more capable interface.it includes in-depth coverage of Spark SQL, DataFrames, and DataSets, which are now the preferred programming API. This includes exploring possible performance issues and strategies for optimization. The course also covers more advanced capabilities such as the use of Spark Streaming to process streaming data, and integrating with the Kafka server. PREREQUISITES Students should be familiar with programming principles and have previous experience in software development using Scala. Previous experience with data streaming, SQL, and HDP is also helpful, but not required. TARGET AUDIENCE Software engineers that are looking to develop in-memory applications for time sensitive and highly iterative applications in an Enterprise HDP environment. FORMAT 50% Lecture/Discussion 50% Hands-0n Labs AGENDA SUMMARY Day 1: Scala Ramp Up, Introduction to Spark Day 2: RDDs and Spark Architecture, Spark SQL, DataFrames and DataSets Day 3: Shuffling, Transformations and Performance, Performance Tuning Day 4: Creating Standalone Applications and Spark Streaming 70

72 DAY 1 OBJECTIVES Scala Introduction Working with: o Variables o Data Types o Control Flow The Scala Interpreter Collections and their Standard Methods (e.g. map()) Working with: o Functions o Methods o Function Literals Define the Following as they Relate to Scale: o Class o Object o Case Class Overview, Motivations, Spark Systems Spark Ecosystem Spark vs. Hadoop Acquiring and Installing Spark The Spark Shell, SparkContext DAY 1 LABS Setting Up the Lab Environment Starting the Scala Interpreter A First Look at Spark A First Look at the Spark Shell 71

73 DAY 2 OBJECTIVES RDD Concepts, Lifecycle, Lazy Evaluation RDD Partitioning and Transformations Working with RDDs Including: o Creating and Transforming (map, filter, etc.) An Overview of RDDs SparkSession, Loading/Saving Data, Data Formats (JSON, CSV, Parquet, text...) Introducing DataFrames and DataSets (Creation and Schema Inference) Identify Supported Data Formats, Including: o JSON o Text o CSV o Parquet Working with the DataFrame (untyped) Query DSL, including: o Column o Filtering o Grouping o Aggregation SQL-based Queries Working with the DataSet (typed) API Mapping and Splitting (flatmap(), explode(), and split()) DataSets vs. DataFrames vs. RDDs DAY 2 LABS RDD Basics Operations on Multiple RDDs Data Formats Spark SQL Basics DataFrame Transformations The DataSet Typed API Splitting Up Data 72

74 DAY 3 OBJECTIVES Working with: o Grouping o Reducing o Joining Shuffling, Narrow vs. Wide Dependencies, and Performance Implications Exploring the Catalyst Query Optimizer (explain(), Query Plans, Issues with lambdas) The Tungsten Optimizer (Binary Format, Cache Awareness, Whole-Stage Code Gen) Discuss Caching, Including: o Concepts o Storage Type o Guidelines Minimizing Shuffling for Increased Performance Using Broadcast Variables and Accumulators General Performance Guidelines o Using the Spark UI o Efficient Transformations o Data Storage o Monitoring DAY 3 LABS Exploring Group Shuffling Seeing Catalyst at Work Seeing Tungsten at Work Working with Caching, Joins, Shuffles, Broadcasts, Accumulators Broadcast General Guidelines 73

75 DAY 4 OBJECTIVES Core API, SparkSession.Builder Configuring and Creating a SparkSession Building and Running Applications - sbt/build.sbt and spark-submit Application Lifecycle (Driver, Executors, and Tasks) Cluster Managers (Standalone, YARN, Mesos) Logging and Debugging Introduction and Streaming Basics Spark Streaming (Spark 1.0+) o DStreams, Receivers, Batching o Stateless Transformation o Windowed Transformation o Stateful Transformation Structured Streaming (Spark 2+) o Continuous Applications o Table Paradigm, Result Table o Steps for Structured Streaming o Sources and Sinks Consuming Kafka Data o Kafka Overview o Structured Streaming - "kafka" Format o Processing the Streaz DAY 4 LABS Spark Job Submission Additional Spark Capabilities Spark Streaming Spark Structured Streaming Spark Structured Streaming with Kafka Revised 10/26/

76 TRAINING AGENDA HORTONWORKS DATA PLATFORM (HDP ) REAL-TIME DEVELOPMENT 4 DAYS This 4 day training course is designed for developers who need to create real-time applications to ingest and process streaming data sources using Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF) environments. Specific technologies covered includes: Apache Hadoop, Apache Kafka, Apache Storm, Apache Spark and Apache HBase as well as Apache NiFi. The highlight of the course is the custom workshop-styled labs that will allow participants to build streaming applications with Storm and Spark Streaming. PREREQUISITES Students should be familiar with programming principles and have experience in software development. Java programming experience is required. SQL and light scripting knowledge is also helpful. No prior Hadoop knowledge is required. TARGET AUDIENCE Developers and data engineers who need to understand and develop real-time / streaming applications on HDP and HDF AGENDA SUMMARY Day 1: Real-Time Architecture and Components Day 2: Real-Time Processing with Spark Streaming Day 3: Real-Time Processing with Storm Day 4: Building DataFlows with HDF/NiFi 75

77 DAY 1 OBJECTIVES Describe the Real-Time Architecture Define the Purpose and Function of Apache Hadoop Describe the Hadoop Ecosystem Frameworks Describe the Role of Hadoop in the Datecenter Describe the Hadoop Distributed File System (HDFS) Detail the Major Architectural Components of HDFS and their Interactions Demonstrate How to Use Apache Zeppelin with Apache Spark List the Major Functions of Apache Zeppelin Describe the Purpose and Benefits of Apache Spark List the Spark High-Level Tools Define Spark REPLs and Application Architecture Explain the Purpose and Function of Reslient Distributed Datasets (RDDs) List the Characteristics of an RDD Explain Spark Programming Basics Define and Use Basic Spark Transformations Define and Use Basic Spark Actions Describe an Anonymous Function Invoke Functions for Multiple RDDs, Created Named Functions and Use Numeric Operations DAY 1 LABS Validating the Lab Environment Using HDFS Commands Introduction to SPARK REPLs and Zeppelin Creating and Manipulating RDDs 76

78 DAY 2 OBJECTIVES Define and Create Pair RDDs Perform Common Operations on Pair RDDs Describe Spark Streaming Create and View Basic Data Streams Perform Basic Transformations on Streaming Data Utilize Window Transformations on Streaming Data Recognize Use Cases for Apache Kafka Explain the Concept of a Topic Leader and Followers Describe the Publication and Consumption of Kafka Messages Describe the Function and Purpose of Apache HBase List Apache HBase Key Features List the Components of the Apache HBase Architecure Describe an Apache HBase as a set of Value Mappings Idenfity Apache HBase as Either Row or Column Oriented Database Demonstrate How to Invoke the HBase Shell List General HBase Commands List HBase Table Management Commands List HBase Data Manipulation Commands DAY 2 LABS Creating and Manipulating Pair RDDs Basic Spark Streaming Basic Spark Streaming Transformations Spark Streaming Window Transformations Creating and Managing Apache Kafka Topics Using the HBase Shell Working with HBase Column Families 77

79 DAY 3 OBJECTIVES Define the Terms Tuple, Stream, Topology, Spout, Bolt, Nimbus and Supervisor Diagram the Relationship Between a Supervisor, Worker Process, Executor and a Task Diagram how Storm Components Interact to Provide Scalable, Distributed and Parallel Computation of Real-time Data Given the Java Code for a Topology, Diagram the Spout and Bolt Connections Define the Purpose of a Stream Grouping List the Types of Stream Groupings Recognize and Explain Sample Spout and Bolt Java Code List Functions that Apache ZooKeeper Provides to Apache Storm List the Differences Between Storm Local Mode and Distributed Mode Given a Topology Code Example, Describe the Spout and Bolt Connections in the Topology Describe How to Integrate Apache Storm with Apache Kafka List Tools Used to Manage Apache Storm Display Online Help Using the Storm Command-line Client Idenfity How to Open the Storm UI Console Interpret the Metrics Displayed in the Apache Storm UI Console Idenfity the Differences Between Reliable and Unrealiable Operation Diagram a Tuple Tree and Identify its Branches Liste the Two Requirements for Reliable Operation Given a Diagram, Describe the Operation of an Acker Task Describe the Responses to Various Apache Storm Component Failures List Three Methods to Disable Reliable Operation DAY 3 LABS Creating a Word Count Topology Performing a Kafka Word Count Using Storm with Kafka and HBase 78

80 DAY 4 OBJECTIVES Define Enterprise Data Flow Describe the Purpose and Function of HDF 2.0 Describe HDF 2.0 Components Describe How IoT is Driving New Requirements List NiFi Architecture and Features List the Three Key Concepts of Apache NiFi Install and Configure NiFi List Configuration Best Practices Describe the Components of the NiFi User Interface Define the Anatomy of a Processor Define the Anatomy of a Connection Describe the Purpose of the Controller Services and Reporting Tasks Build a NiFi Data Flow Describe the Command and Control of Data Flow Describe How to Start and Stop a Component Describe the Anatomy of a Processor Group Define Data Provenance and Data Provenance Events Describe NiFi Cluster, State Management Describe a Basic Cluster Setup DAY 4 LABS Demonstration: The NiFi User Interface Controller Services and Reporting Tasks Building a NiFi DataFlow Working with Processor Groups Working with Remote Processor Groups Integrating HDF with HDP Revised 10/06/

81 TRAINING OFFERING SCI-221 DATA SCIENCE FOUNDATIONS 3 DAYS FOUNDATION This course Provides instruction on the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, Jupyter, Mahout, Hive, NumPy, Pandas, SciPy, Scikit-learn), and Spark MLlib. PREREQUISITES Students must have experience with at least one programming or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles. Students new to Hadoop are encouraged to attend the HDP Overview: Apache Hadoop Essentials course. TARGET AUDIENCE Architects, software developers, analysts, and data scientists who need to apply data science and machine learning on Hadoop. FORMAT 50% Lecture/Discussion 50% Hands-0n Labs AGENDA SUMMARY Day 1: Introduction to Data Science, Python, Hadoop, and Machine Learning Day 2: Working with Spark RDDs, DataFrames and SparkSQL, Visualization in Zeppelin Day 3: Machine Learning Algorithms, Natural Language Processing, and Spark MLlib 80

82 DAY 1 OBJECTIVES Define Data Science and Explain What a Data Scientist Does Differentiate Between Different Types of Data Roles List a Number of Data Science Use Cases Present an Overview of Python List and Describe Python Programming Components Import Python Modules Develop Python Code List the Python Packages that Comprise the Scientific Python Ecosystem and Explain their Use Cases Utilize the Jupyter Notebook Demonstrate How to Use NumPy Core Functionality Explain the Data Structures in the Ecosystem Describe the Components of the Big Data Scientific Stack Explain the Benefits of Big Data for Machine Learning Describe How Files are Stored in a Distributed Manner Give a High-Level Overview of Zeppelin and Spark Explain the Different Task Types of Machine Learning Explain How Supervised Learning Differs from Unsupervised Learning Describe the Modeling Process Explain What a Feature and a Target or Label is in Machine Learning DAY 1 LABS AND DEMONSTRATIONS Using IPython Data Analysis with Python Using HDFS Commands Introduction to Spark REPLs and Zeppelin Using Apache Mahout 81

83 DAY 2 OBJECTIVES Explain what an RDD is Explain How RDDs are Partitioned Create, Manipulate, and Restore RDDs Describe what Lazy Evaluation Means and the Two Types of Spark Operations Define, Create and Perform Common Functions with Pair RDDs Explain what a DataFrame is and How it Differs from an RDD and a Dataset Demonstration How to Create and Manipulate DataFrames Explain the Benefit of Catalyst Optimizer Use SparkSQL to Create Tables Demonstrate How to Use Visualization and Collaboration in Zeppelin Use Dynamic Forms Create an Application to Submit to the Cluster Describe Client vs Cluster Submission with YARN Submit an Application to the Cluster List and Set Important Configurations Items DAY 2 LABS AND DEMONSTRATIONS Create and Manipulate RDDs Create and Save DataFrames Build and Submit Spark Applications 82

84 DAY 3 OBJECTIVES Describe common machine learning applications List the pros and cons of various algorithms Explain what Natural Language Processing is Describe common tasks in the field of NLP Utilize NLTK Explain the Difference Between spark.mllib and spark.ml Explain what Pipelines do Explain the Feature Engineering Capabilities of Spark MLlib Build a Classifier with MLlib DAY 3 LABS AND DEMONSTRATIONS Use the Python Natural Language Toolkit (NLTK) Classify text using Naïve Bayes Compute K-nearest neighbors Creating a Spam Classifier with MLlib Sentiment Analysis with Spark MLlib Revised 02/12/

85 LEARN FROM THE BIG DATA EXPERTS Hortonworks University courses are designed by the leaders and committers of Apache Hadoop. We provide immersive, real-world experience in scenario-based training. Courses offer unmatched depth and expertise available in both the classroom or online from anywhere in the world. We prepare you to be an expert with highly valued skills and for Certification. 84

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com : Getting Started Guide Copyright 2012, 2014 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing,

More information

HDP Security Overview

HDP Security Overview 3 HDP Security Overview Date of Publish: 2018-07-15 http://docs.hortonworks.com Contents HDP Security Overview...3 Understanding Data Lake Security... 3 What's New in This Release: Knox... 5 What's New

More information

HDP Security Overview

HDP Security Overview 3 HDP Security Overview Date of Publish: 2018-07-15 http://docs.hortonworks.com Contents HDP Security Overview...3 Understanding Data Lake Security... 3 What's New in This Release: Knox... 5 What's New

More information

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Blended Learning Outline: Cloudera Data Analyst Training (171219a) Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills

More information

Hortonworks Data Platform

Hortonworks Data Platform Hortonworks Data Platform Workflow Management (August 31, 2017) docs.hortonworks.com Hortonworks Data Platform: Workflow Management Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks

More information

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training:: Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional

More information

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours) Bigdata Fundamentals Day1: (2hours) 1. Understanding BigData. a. What is Big Data? b. Big-Data characteristics. c. Challenges with the traditional Data Base Systems and Distributed Systems. 2. Distributions:

More information

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Hadoop & Big Data Analytics Complete Practical & Real-time Training An ISO Certified Training Institute A Unit of Sequelgate Innovative Technologies Pvt. Ltd. www.sqlschool.com Hadoop & Big Data Analytics Complete Practical & Real-time Training Mode : Instructor Led LIVE

More information

Hortonworks and The Internet of Things

Hortonworks and The Internet of Things Hortonworks and The Internet of Things Dr. Bernhard Walter Solutions Engineer About Hortonworks Customer Momentum ~700 customers (as of November 4, 2015) 152 customers added in Q3 2015 Publicly traded

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

microsoft

microsoft 70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series

More information

Introduction to BigData, Hadoop:-

Introduction to BigData, Hadoop:- Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,

More information

Hadoop. Introduction / Overview

Hadoop. Introduction / Overview Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures

More information

Innovatus Technologies

Innovatus Technologies HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String

More information

Big Data Hadoop Course Content

Big Data Hadoop Course Content Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux

More information

MapR Enterprise Hadoop

MapR Enterprise Hadoop 2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS

More information

Configuring and Deploying Hadoop Cluster Deployment Templates

Configuring and Deploying Hadoop Cluster Deployment Templates Configuring and Deploying Hadoop Cluster Deployment Templates This chapter contains the following sections: Hadoop Cluster Profile Templates, on page 1 Creating a Hadoop Cluster Profile Template, on page

More information

Big Data Architect.

Big Data Architect. Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional

More information

Exam Questions

Exam Questions Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure

More information

Oracle Data Integrator 12c: Integration and Administration

Oracle Data Integrator 12c: Integration and Administration Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 67863102 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive

More information

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) The Certificate in Software Development Life Cycle in BIGDATA, Business Intelligence and Tableau program

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component

More information

Hortonworks Data Platform

Hortonworks Data Platform Apache Ambari Operations () docs.hortonworks.com : Apache Ambari Operations Copyright 2012-2018 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open

More information

Oracle Data Integrator 12c: Integration and Administration

Oracle Data Integrator 12c: Integration and Administration Oracle University Contact Us: +34916267792 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive data integration platform

More information

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction

More information

Certified Big Data and Hadoop Course Curriculum

Certified Big Data and Hadoop Course Curriculum Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation

More information

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Analytics using Apache Hadoop and Spark with Scala Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important

More information

Oracle Big Data Fundamentals Ed 2

Oracle Big Data Fundamentals Ed 2 Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies

More information

We are ready to serve Latest Testing Trends, Are you ready to learn.?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn.?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn.?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : About Quality Thought We are

More information

Certified Big Data Hadoop and Spark Scala Course Curriculum

Certified Big Data Hadoop and Spark Scala Course Curriculum Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills

More information

Hortonworks Data Platform

Hortonworks Data Platform Hortonworks Data Platform Data Movement and Integration (April 3, 2017) docs.hortonworks.com Hortonworks Data Platform: Data Movement and Integration Copyright 2012-2017 Hortonworks, Inc. Some rights reserved.

More information

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

exam.   Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0 70-775.exam Number: 70-775 Passing Score: 800 Time Limit: 120 min File Version: 1.0 Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight Version 1.0 Exam A QUESTION 1 You use YARN to

More information

CTX-1259AI Citrix Presentation Server 4.5: Administration

CTX-1259AI Citrix Presentation Server 4.5: Administration C O U R S E D E S C R I P T I O N CTX-1259AI Citrix Presentation Server 4.5: Administration CTX-1259AI Citrix Presentation Server 4.5: Administration provides the foundation necessary to effectively deploy

More information

50 Must Read Hadoop Interview Questions & Answers

50 Must Read Hadoop Interview Questions & Answers 50 Must Read Hadoop Interview Questions & Answers Whizlabs Dec 29th, 2017 Big Data Are you planning to land a job with big data and data analytics? Are you worried about cracking the Hadoop job interview?

More information

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo Microsoft Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo NEW QUESTION 1 You have an Azure HDInsight cluster. You need to store data in a file format that

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture Big Data Syllabus Hadoop YARN Setup Programming in YARN framework j Understanding big data and Hadoop Big Data Limitations and Solutions of existing Data Analytics Architecture Hadoop Features Hadoop Ecosystem

More information

Duration: 5 Days Course Code: M20764 Version: B Delivery Method: Elearning (Self-paced)

Duration: 5 Days Course Code: M20764 Version: B Delivery Method: Elearning (Self-paced) Administering a SQL Database Infrastructure Duration: 5 Days Course Code: M20764 Version: B Delivery Method: Elearning (Self-paced) Overview: This five-day instructor-led course provides students who administer

More information

Expert Lecture plan proposal Hadoop& itsapplication

Expert Lecture plan proposal Hadoop& itsapplication Expert Lecture plan proposal Hadoop& itsapplication STARTING UP WITH BIG Introduction to BIG Data Use cases of Big Data The Big data core components Knowing the requirements, knowledge on Analyst job profile

More information

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo Microsoft Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo NEW QUESTION 1 HOTSPOT You install the Microsoft Hive ODBC Driver on a computer that runs Windows

More information

CMB-207-1I Citrix Desktop Virtualization Fast Track

CMB-207-1I Citrix Desktop Virtualization Fast Track Page1 CMB-207-1I Citrix Desktop Virtualization Fast Track This fast-paced course covers select content from training courses CXA-206: Citrix XenApp 6.5 Administration and CXD-202: Citrix XenDesktop 5 Administration

More information

Introduction to Cloudbreak

Introduction to Cloudbreak 2 Introduction to Cloudbreak Date of Publish: 2019-02-06 https://docs.hortonworks.com/ Contents What is Cloudbreak... 3 Primary use cases... 3 Interfaces...3 Core concepts... 4 Architecture... 7 Cloudbreak

More information

BEST BIG DATA CERTIFICATIONS

BEST BIG DATA CERTIFICATIONS VALIANCE INSIGHTS BIG DATA BEST BIG DATA CERTIFICATIONS email : info@valiancesolutions.com website : www.valiancesolutions.com VALIANCE SOLUTIONS Analytics: Optimizing Certificate Engineer Engineering

More information

Hortonworks Data Platform

Hortonworks Data Platform Data Governance () docs.hortonworks.com : Data Governance Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform

More information

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale

More information

Course 6231A: Maintaining a Microsoft SQL Server 2008 Database

Course 6231A: Maintaining a Microsoft SQL Server 2008 Database Course 6231A: Maintaining a Microsoft SQL Server 2008 Database About this Course This five-day instructor-led course provides students with the knowledge and skills to maintain a Microsoft SQL Server 2008

More information

Hortonworks Data Platform

Hortonworks Data Platform Hortonworks Data Platform Apache Ambari Administration (March 5, 2018) docs.hortonworks.com Hortonworks Data Platform: Apache Ambari Administration Copyright 2012-2018 Hortonworks, Inc. Some rights reserved.

More information

Installing SmartSense on HDP

Installing SmartSense on HDP 1 Installing SmartSense on HDP Date of Publish: 2018-07-12 http://docs.hortonworks.com Contents SmartSense installation... 3 SmartSense system requirements... 3 Operating system, JDK, and browser requirements...3

More information

Maintaining a Microsoft SQL Server 2008 Database (Course 6231A)

Maintaining a Microsoft SQL Server 2008 Database (Course 6231A) Duration Five days Introduction Elements of this syllabus are subject to change. This five-day instructor-led course provides students with the knowledge and skills to maintain a Microsoft SQL Server 2008

More information

HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation)

HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation) HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation) Introduction to BIGDATA and HADOOP What is Big Data? What is Hadoop? Relation between Big

More information

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without

More information

Knox Implementation with AD/LDAP

Knox Implementation with AD/LDAP Knox Implementation with AD/LDAP Theory part Introduction REST API and Application Gateway for the Apache Hadoop Ecosystem: The Apache Knox Gateway is an Application Gateway for interacting with the REST

More information

Big Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka

Big Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka Course Curriculum: Your 10 Module Learning Plan Big Data and Hadoop About Edureka Edureka is a leading e-learning platform providing live instructor-led interactive online training. We cater to professionals

More information

Oracle Big Data Fundamentals Ed 1

Oracle Big Data Fundamentals Ed 1 Oracle University Contact Us: +0097143909050 Oracle Big Data Fundamentals Ed 1 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big Data

More information

NA120 Network Automation 10.x Essentials

NA120 Network Automation 10.x Essentials Course Data Sheet NA120 Network Automation 10.x Essentials Course No.: NA120-101 Category/Sub Category: Operations Management/Network Management Center For software version(s): 9.0 10.1 Software version

More information

Academy Catalogue - Customers-

Academy Catalogue - Customers- Academy Catalogue - Customers- Last update: 1/17/2019 2019 Tagetik Software - All Rights Reserved This document contains the CCH Tagetik Academy courses catalogue, with detailed information about optimal

More information

Hadoop Development Introduction

Hadoop Development Introduction Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand

More information

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,

More information

Cmprssd Intrduction To

Cmprssd Intrduction To Cmprssd Intrduction To Hadoop, SQL-on-Hadoop, NoSQL Arseny.Chernov@Dell.com Singapore University of Technology & Design 2016-11-09 @arsenyspb Thank You For Inviting! My special kind regards to: Professor

More information

Developing Enterprise Cloud Solutions with Azure

Developing Enterprise Cloud Solutions with Azure Developing Enterprise Cloud Solutions with Azure Java Focused 5 Day Course AUDIENCE FORMAT Developers and Software Architects Instructor-led with hands-on labs LEVEL 300 COURSE DESCRIPTION This course

More information

Introduction to Big Data

Introduction to Big Data Introduction to Big Data OVERVIEW We are experiencing transformational changes in the computing arena. Data is doubling every 12 to 18 months, accelerating the pace of innovation and time-to-value. The

More information

ISILON ONEFS WITH HADOOP KERBEROS AND IDENTITY MANAGEMENT APPROACHES. Technical Solution Guide

ISILON ONEFS WITH HADOOP KERBEROS AND IDENTITY MANAGEMENT APPROACHES. Technical Solution Guide ISILON ONEFS WITH HADOOP KERBEROS AND IDENTITY MANAGEMENT APPROACHES Technical Solution Guide Hadoop and OneFS cluster configurations for secure access and file permissions management ABSTRACT This technical

More information

HDInsight > Hadoop. October 12, 2017

HDInsight > Hadoop. October 12, 2017 HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond

More information

Administration 1. DLM Administration. Date of Publish:

Administration 1. DLM Administration. Date of Publish: 1 DLM Administration Date of Publish: 2018-05-18 http://docs.hortonworks.com Contents Replication concepts... 3 HDFS cloud replication...3 Hive cloud replication... 3 Cloud replication guidelines and considerations...4

More information

Hadoop Online Training

Hadoop Online Training Hadoop Online Training IQ training facility offers Hadoop Online Training. Our Hadoop trainers come with vast work experience and teaching skills. Our Hadoop training online is regarded as the one of the

More information

Hadoop. Introduction to BIGDATA and HADOOP

Hadoop. Introduction to BIGDATA and HADOOP Hadoop Introduction to BIGDATA and HADOOP What is Big Data? What is Hadoop? Relation between Big Data and Hadoop What is the need of going ahead with Hadoop? Scenarios to apt Hadoop Technology in REAL

More information

Hortonworks SmartSense

Hortonworks SmartSense Hortonworks SmartSense Installation (January 8, 2018) docs.hortonworks.com Hortonworks SmartSense: Installation Copyright 2012-2018 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform,

More information

Hortonworks Certified Developer (HDPCD Exam) Training Program

Hortonworks Certified Developer (HDPCD Exam) Training Program Hortonworks Certified Developer (HDPCD Exam) Training Program Having this badge on your resume can be your chance of standing out from the crowd. The HDP Certified Developer (HDPCD) exam is designed for

More information

Big Data Analytics. Description:

Big Data Analytics. Description: Big Data Analytics Description: With the advance of IT storage, pcoressing, computation, and sensing technologies, Big Data has become a novel norm of life. Only until recently, computers are able to capture

More information

Chase Wu New Jersey Institute of Technology

Chase Wu New Jersey Institute of Technology CS 644: Introduction to Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Institute of Technology Some of the slides were provided through the courtesy of Dr. Ching-Yung Lin at Columbia

More information

NetVault Backup Web-based Training Bundle - 2 Student Pack

NetVault Backup Web-based Training Bundle - 2 Student Pack NetVault Backup Web-based Training Bundle - 2 Student Pack Description Get access to both Netvault Backup Implementation & Administration Web-based Training course and Netvault Backup Advanced Administration

More information

Hortonworks Data Platform

Hortonworks Data Platform Data Governance () docs.hortonworks.com : Data Governance Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

Administration 1. DLM Administration. Date of Publish:

Administration 1. DLM Administration. Date of Publish: 1 DLM Administration Date of Publish: 2018-07-03 http://docs.hortonworks.com Contents ii Contents Replication Concepts... 4 HDFS cloud replication...4 Hive cloud replication... 4 Cloud replication guidelines

More information

1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions

1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions Big Data Hadoop Architect Online Training (Big Data Hadoop + Apache Spark & Scala+ MongoDB Developer And Administrator + Apache Cassandra + Impala Training + Apache Kafka + Apache Storm) 1 Big Data Hadoop

More information

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

MAPR DATA GOVERNANCE WITHOUT COMPROMISE MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance

More information

Securing the Oracle BDA - 1

Securing the Oracle BDA - 1 Hello and welcome to this online, self-paced course titled Administering and Managing the Oracle Big Data Appliance (BDA). This course contains several lessons. This lesson is titled Securing the Oracle

More information

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta iorhian@gmail.com Radu Chilom radu.chilom@gmail.com Big data analytics / machine learning 6+ years

More information

NE Administering Windows Server 2012

NE Administering Windows Server 2012 NE-20411 Administering Windows Server 2012 Summary Duration 5 Days Audience IT Professionals Level 200 Technology Windows Server 2012 Delivery Method Instructor-led (Classroom) Training Credits N/A Introduction

More information

Citrix XenApp 6.5 Administration

Citrix XenApp 6.5 Administration Citrix XenApp 6.5 Administration CXA206; 5 Days, Instructor-led Course Description Citrix XenApp 6.5 Administration training course provides the foundation necessary for administrators to effectively centralize

More information

CXA Citrix XenApp 6.5 Administration

CXA Citrix XenApp 6.5 Administration 1800 ULEARN (853 276) www.ddls.com.au CXA-206-1 Citrix XenApp 6.5 Administration Length 5 days Price $5500.00 (inc GST) Citrix XenApp 6.5 Administration training course provides the foundation necessary

More information

Managing and Monitoring a Cluster

Managing and Monitoring a Cluster 2 Managing and Monitoring a Cluster Date of Publish: 2018-04-30 http://docs.hortonworks.com Contents ii Contents Introducing Ambari operations... 5 Understanding Ambari architecture... 5 Access Ambari...

More information

Hortonworks Data Platform

Hortonworks Data Platform Apache Ambari Views () docs.hortonworks.com : Apache Ambari Views Copyright 2012-2017 Hortonworks, Inc. All rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source

More information

Duration Level Technology Delivery Method Training Credits. Classroom ILT 5 Days Advanced SQL Server

Duration Level Technology Delivery Method Training Credits. Classroom ILT 5 Days Advanced SQL Server NE-20764C Administering a SQL Database Infrastructure Summary Duration Level Technology Delivery Method Training Credits Classroom ILT 5 Days Advanced SQL Virtual ILT On Demand SATV Introduction This 5-day

More information

Hadoop Security. Building a fence around your Hadoop cluster. Lars Francke June 12, Berlin Buzzwords 2017

Hadoop Security. Building a fence around your Hadoop cluster. Lars Francke June 12, Berlin Buzzwords 2017 Hadoop Security Building a fence around your Hadoop cluster Lars Francke June 12, 2017 Berlin Buzzwords 2017 Introduction About me - Lars Francke Partner & Co-Founder at OpenCore Before that: EMEA Hadoop

More information

Automation of Rolling Upgrade for Hadoop Cluster without Data Loss and Job Failures. Hiroshi Yamaguchi & Hiroyuki Adachi

Automation of Rolling Upgrade for Hadoop Cluster without Data Loss and Job Failures. Hiroshi Yamaguchi & Hiroyuki Adachi Automation of Rolling Upgrade for Hadoop Cluster without Data Loss and Job Failures Hiroshi Yamaguchi & Hiroyuki Adachi About Us 2 Hiroshi Yamaguchi Hiroyuki Adachi Hadoop DevOps Engineer Hadoop Engineer

More information

Top 25 Big Data Interview Questions And Answers

Top 25 Big Data Interview Questions And Answers Top 25 Big Data Interview Questions And Answers By: Neeru Jain - Big Data The era of big data has just begun. With more companies inclined towards big data to run their operations, the demand for talent

More information

Red Hat Certified System Administrator (RHCSA) RHCSA 7 Requirements and Syllabus

Red Hat Certified System Administrator (RHCSA) RHCSA 7 Requirements and Syllabus Red Hat Certified System Administrator (RHCSA) RHCSA 7 Requirements and Syllabus In preparation to earn the Red Hat Certified System Administrator (RHCSA), Red Hat recommends the following: For System

More information

Hortonworks Data Platform

Hortonworks Data Platform Hortonworks Data Platform Apache Ambari Upgrade for IBM Power Systems (May 17, 2018) docs.hortonworks.com Hortonworks Data Platform: Apache Ambari Upgrade for IBM Power Systems Copyright 2012-2018 Hortonworks,

More information

Hadoop: The Definitive Guide

Hadoop: The Definitive Guide THIRD EDITION Hadoop: The Definitive Guide Tom White Q'REILLY Beijing Cambridge Farnham Köln Sebastopol Tokyo labte of Contents Foreword Preface xv xvii 1. Meet Hadoop 1 Daw! 1 Data Storage and Analysis

More information

Introduction into Big Data analytics Lecture 3 Hadoop ecosystem. Janusz Szwabiński

Introduction into Big Data analytics Lecture 3 Hadoop ecosystem. Janusz Szwabiński Introduction into Big Data analytics Lecture 3 Hadoop ecosystem Janusz Szwabiński Outlook of today s talk Apache Hadoop Project Common use cases Getting started with Hadoop Single node cluster Further

More information

Data Virtualization Implementation Methodology and Best Practices

Data Virtualization Implementation Methodology and Best Practices White Paper Data Virtualization Implementation Methodology and Best Practices INTRODUCTION Cisco s proven Data Virtualization Implementation Methodology and Best Practices is compiled from our successful

More information

EsgynDB Enterprise 2.0 Platform Reference Architecture

EsgynDB Enterprise 2.0 Platform Reference Architecture EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed

More information

Table of Index Hadoop for Developers Hibernate: Using Hibernate For Java Database Access HP FlexNetwork Fundamentals, Rev. 14.21 HP Navigating the Journey to Cloud, Rev. 15.11 HP OneView 1.20 Rev.15.21

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

Course 6231A: Maintaining a Microsoft SQL Server 2008 Database

Course 6231A: Maintaining a Microsoft SQL Server 2008 Database Course 6231A: Maintaining a Microsoft SQL Server 2008 Database OVERVIEW About this Course Elements of this syllabus are subject to change. This five-day instructor-led course provides students with the

More information