Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Similar documents
Lambda Architecture for Batch and Stream Processing. October 2018

WHITEPAPER. MemSQL Enterprise Feature List

microsoft

DATA SCIENCE USING SPARK: AN INTRODUCTION

BIG DATA COURSE CONTENT

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

HDInsight > Hadoop. October 12, 2017

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Best Practices and Performance Tuning on Amazon Elastic MapReduce

Exam Questions

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

Unifying Big Data Workloads in Apache Spark

Hadoop. Introduction / Overview

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Understanding the latent value in all content

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Integrate MATLAB Analytics into Enterprise Applications

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Apache Ignite - Using a Memory Grid for Heterogeneous Computation Frameworks A Use Case Guided Explanation. Chris Herrera Hashmap

The Evolution of a Data Project

Modern Data Warehouse The New Approach to Azure BI

The Evolution of Big Data Platforms and Data Science

Towards a Real- time Processing Pipeline: Running Apache Flink on AWS

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect

An Introduction to Big Data Formats

Stages of Data Processing

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Quick Install for Amazon EMR

Talend Big Data Sandbox. Big Data Insights Cookbook

Hadoop course content

Innovatus Technologies

Integrate MATLAB Analytics into Enterprise Applications

Overview of Data Services and Streaming Data Solution with Azure

Bringing Data to Life

Hadoop, Yarn and Beyond

New Features and Enhancements in Big Data Management 10.2

Cloud Computing & Visualization

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Developing Microsoft Azure Solutions (70-532) Syllabus

Down the event-driven road: Experiences of integrating streaming into analytic data platforms

Azure Data Factory. Data Integration in the Cloud

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud

Cloud Analytics and Business Intelligence on AWS

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

Oracle Big Data Discovery

What s New at AWS? A selection of some new stuff. Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services

Automated Netezza to Cloud Migration

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Big Data Hadoop Stack

Hortonworks Data Platform

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Enabling Data Control in a Multi-Cloud World

Certified Big Data and Hadoop Course Curriculum

Practical Big Data Processing An Overview of Apache Flink

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Data Storage Infrastructure at Facebook

Container 2.0. Container: check! But what about persistent data, big data or fast data?!

Data Analytics at Logitech Snowflake + Tableau = #Winning

Developing Microsoft Azure Solutions (70-532) Syllabus

BI ENVIRONMENT PLANNING GUIDE

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

Developing Microsoft Azure Solutions (70-532) Syllabus

Migrate from Netezza Workload Migration

The Reality of Qlik and Big Data. Chris Larsen Q3 2016

Franck Mercier. Technical Solution Professional Data + AI Azure Databricks

Migrate from Netezza Workload Migration

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP

CSE 444: Database Internals. Lecture 23 Spark

Databricks, an Introduction

Shark: SQL and Rich Analytics at Scale. Michael Xueyuan Han Ronny Hajoon Ko

Big Data with Hadoop Ecosystem

Index. Scott Klein 2017 S. Klein, IoT Solutions in Microsoft s Azure IoT Suite, DOI /

QLIK INTEGRATION WITH AMAZON REDSHIFT

Integrating Splunk with AWS services:

Serverless Computing. Redefining the Cloud. Roger S. Barga, Ph.D. General Manager Amazon Web Services

Integrate MATLAB Analytics into Enterprise Applications

Certified Big Data Hadoop and Spark Scala Course Curriculum

Flash Storage Complementing a Data Lake for Real-Time Insight

Scalable Tools - Part I Introduction to Scalable Tools

Microsoft Perform Data Engineering on Microsoft Azure HDInsight.

Apache HAWQ (incubating)

Informatica Data Lake Management on the AWS Cloud

Information empowerment for your evolving data ecosystem

AWS Serverless Architecture Think Big

MATLAB. Senior Application Engineer The MathWorks Korea The MathWorks, Inc. 2

Ian Choy. Technology Solutions Professional

Transcription:

Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

ACTIVATORS Designed to give your team assistance when you need it most without requiring a major time commitment or lengthy scope of work. Activators let you focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. These iterative remote sessions provide you with expert guidance in onetime configurations, optimizations, troubleshooting and transfers of knowledge. Although not all Activators are the same where some are quick one session solutions and others will require a series of meetings, the outcomes remain the same deliver maximum ROI. How it Works Activators are a convenient way to assist you with moving your big data projects forward and fit into your day when you re available. Simply review your business objectives, review our library and request one or a series of Activators and one of our experts will schedule a call to determine your requirements and match you with an expert(s) with the best skills. Discovery Web Conference Remote Services Review The first step in completing each Activator is a discovery session to determine your business requirements and align on a successful outcome. Through a series of collaborative sessions and remote services our team of experts will iterate through a series of tasks to define, complete and validate the outcome. The final step to completing each Activator is agreement that the outcome satisfies your business requirements and next steps, if required, are discussed. Don t see what you re looking for. Ask us how we can create a custom Activator that meets your business needs. 2

ACTIVATORS: ACCOUNT CONFIGURATION Account Configuration Activators focus on cloud account access, permissions, policies and storage configuration options. Select any one of these Activators to ensure your Qubole account is configured properly. REF TITLE OVERVIEW ACT01 Configure Qubole Account with IAM Role Learn how to configure Qubole with an IAM crossaccount role. ACT02 Configure Qubole Account with Dual IAM Roles Learn how to configure Qubole with a dual IAM role. ACT03 Configure Qubole Account with Service Principal Learn how to configure Qubole Account with service principal. ACT04 Configure Shared S3 Learn how to configure an S3 bucket to be shared across multiple AWS accounts. ACT05 Configure Access to Multiple Blob Storage Accounts Learn how to configure a QDS cluster to securely access multiple Blob storage accounts. ACT06 Configure Amazon Virtual Private Cloud (VPC) Learn how to configure a Qubole cluster in an Amazon VPC with public and private subnets. ACT07 Configure an Azure Virtual Network Learn how to configure a Qubole cluster in an Azure Virtual Network. ACT08 Migrate to Custom Metastore Gain flexibility with a custom metastore. Get expert help with migrating an existing account to a custom Hive metastore. ACT09 Troubleshoot Account Configuration Get expert advice on troubleshooting account connectivity and access configuration issues. CLOUD PLATFORM AWS AWS Azure AWS Azure AWS Azure AWS and Azure AWS and Azure OUTCOMES Qubole account configured with an IAM Cross-Account Role Qubole account configured with dual IAM role Configured Qubole Azure account Configured S3 bucket accessible by more than one account Configured access to storage Qubole cluster running within a VPC Configured virtual network Custom metastore Functioning Qubole AWS account 3

ACTIVATORS: SECURITY Security Activators provide ease of access to your data in a structured and controlled manner. Select any one of these Activators to further secure your data and Qubole environment. REF TITLE OVERVIEW OUTCOMES SEC01 Manage Access to QDS Develop a data governance strategy for your organization. SEC02 Configure Single Sign- on (SSO) Create a seamless secure login process for your organization by enabling SSO with SAML or Google Authorization Service. Data security strategy Groups, users and roles Configured SSO access to Qubole SEC03 Configure Encryption for Data at Rest and in Transit Provide secure access to your organization s data. Encrypted data SEC04 Security Review on Data Requests Learn security best practices when accessing data through queries and commands. SEC05 Configure Hive Authorization Enhance security on data residing in Qubole HDFS with Hive Authorization. Security review on one data request Secured Hive tables 4

ACTIVATORS: CLUSTER MANAGEMENT Cluster Management Activators provide best practices around deploying and managing clusters. Select any one of these Activators to learn more about proper cluster configuration and run your workloads at optimal performance. REF TITLE OVERVIEW CLUSTER OUTCOMES CLM01 Configure a QDS Cluster Learn best practices and appropriately size clusters to run at peak performance and optimal cost. CLM02 Create Custom Bootstrap Script Customize the installation, management and configuration of the tools useful for cluster monitoring and data loading. Hive, Spark and Hive, Spark, and Create an optimally configured Spark cluster and obtain knowledge around Spark cluster sizing best practices and cost optimization Enable additional functionality on the Hive cluster CLM03 Add User Defined Function (UDF) Expand the functionality of one of Qubole s data engines with a custom UDF. CLM04 Add Custom JAR File Expand the functionality of one of Qubole s data engines with a custom JAR. CLM05 Configure New Qubole Environment Expand the functionality of one of Qubole s data engines with a custom JAR. CLM06 Configure RubiX Improve the performance of your queries and share the cache data across multiple jobs. CLM07 Configure an Cluster Learn the fundamentals of configuring an cluster. Hive, Spark and Hive, Spark and Hive, Spark and Configure and deploy a custom UDF Enable a JAR file for cluster Create a New Qubole Environment configured to run with RubiX Configured cluster 5

ACTIVATORS: OPTIMIZATION Optimization Activators assist you with tuning and troubleshooting your jobs. Select any on of these Activators to learn more about getting peak performance and reduction in costs. REF TITLE OVERVIEW CLUSTER OUTCOMES OPM01 Optimize a Qubole Job Be it increased performance or cost reduction get the most out of each job through optimization best practices. OPM02 Troubleshoot a Qubole Job Unblock dependent operations and learn the fundamentals of troubleshooting a job running on a Qubole cluster. OPM03 Enable Spark Streaming Process real-time data from various data pipelines while reusing the same code for batch processing, join historical data, or run ad-hoc queries on stream state. OPM04 Tune Spark Application with Sparklens Learn how to tune Spark applications with Sparklens. Hive, Spark and Hive, Spark and Spark Spark Workload optimized for performance and cost Restored dependent operations Functionality to leverage streaming use cases Knowledge on leveraging Sparklens to optimize Spark workloads 6

ACTIVATORS: MIGRATION Migration Activators give you the flexibility to use the right data engine solution for the challenge at hand. Select any one of these Activators to learn more about a new data engine and steps required for migration. REF TITLE OVERVIEW CLUSTER OUTCOMES MIG01 Migrate SQL Job to Hive SQL Realize the benefits of Apache Hive and migrate a SQL job to Hive SQL. MIG02 Migrate Hive MapReduce Job to TEZ Build high performance batch and interactive data processing native YARN applications. MIG03 Migrate Hadoop 1 Job to Hadoop 2 Realize the benefits in scalability, performance and reliability with Hadoop 2. MIG04 Migrate Hadoop Job to Hive Realize the benefits of Apache Hive and migrate a Hadoop job to Hive SQL. MIG05 Migrate Hadoop Job to Provide analysts with low latency ad-hoc SQL query capabilities across HDFS and other data sources. MIG06 Migrate Hive Job to Spark Expand the capabilities of your data engineering and data scientists teams with Apache Spark. Hive Hive Hadoop Hive Spark Migrated and tested Hive QL Tez job that Integrates natively with Apache Hadoop YARN and performs well within mixed workload clusters Updated code running on Hadoop 2 Migrated and tested Hive job Migrated and tested job Migrated and tested Spark job MIG07 Migrate Hive Job to Migrate one Hive query to. Migrated and tested job 7

ACTIVATORS: SCHEDULERS Scheduler Activators allow your workloads to run on time and error free. Select any one of these Activators to learn more about Qubole s built in Scheduler or for advanced workflow scheduling. REF TITLE OVERVIEW SCHEDULER OUTCOMES SCH01 Schedule a Job with the Qubole Scheduler Configure jobs, specify intervals at which they run, and learn more about the additional capabilities that make the Qubole Scheduler a powerful tool for automating your workflow. SCH02 Create and Deploy DAG to an Cluster Learn the fundamentals of developing and deploying DAGs to an cluster. SCH03 Implement Connections and Variables Implement Connections and Variables for a single use case or DAG. SCH04 Create DAG to Schedule an External Job Manage workloads external to Qubole with an DAG. SCH05 Troubleshoot a DAG Learn the fundamentals of troubleshooting a DAG. SCH06 Migrate an Existing DAG to Qubole Migrate an existing schedule DAG to Qubole. SCH07 Integrate with External Schedule Schedule a Qubole Command using Qubole Operator and an externally managed. Qubole Scheduled workload Validated and deployed DAG in Implemented connection or variable for Qubole DAG running an external process Tested running DAG Operating DAG on a Qubole cluster Operating DAG on an external cluster 8

ACTIVATORS: DATA MANAGEMENT Data Management Activators are key to the ingestion and exploration of your datasets. Select any one of these Activators to start importing, exporting and exploring your data. REF TITLE OVERVIEW OUTCOMES DMT01 Configure a Datastore Manage data sets through import and export procedures in Qubole. DMT02 Migrate Data to a Different Format Prepare data for consumption by multiple users and applications. DMT03 Data Migration Across Cloud Providers Work with an expert to migrate data from one cloud platform to another for Qubole consumption. Configured datastore Tested Qubole Hive code required to convert data to a new format Data migrated to new cloud platform DMT04 Import, Export and Query Data with Analyze Commands Learn how to use Analyze to import, export and run queries against internal and external datasets. Learn how to use Analyze and successfully import, export and query data DMT05 Troubleshoot Analyze Commands Troubleshoot issues with the import, export or query of data. Restored dependent operations 9

ACTIVATORS: NOTEBOOKS AND DASHBOARDS Notebook and Dashboard Activators provide an easy way to analyze, visualize and share data. Select any one of these Activators to explore this functionality and leverage the benefits of Notebooks and Dashboards. REF TITLE OVERVIEW OUTCOMES NDS01 Use Qubole Notebooks Ingest, analyze, visualize and share data through the use of notebooks. NDS02 Configure a Notebook Interpreter Use a specific language/data processing backend in the Notebook. NDS03 Migrate a Jupyter Notebook Assist with the migration of a local Jupyter notebook to a Qubole Spark notebook. NDS04 Implement a Local Jupyter Notebook Configure a local Jupyter notebook with Livy and Sparkmagic. NDS06 Use Qubole Dashboards Share data and visualizations with non-qubole and Qubole users. Gain insights into your data with a configured Notebook Provide enhanced functionality with a Spark Notebook interpreter configured for a specific use case Configured Spark Notebook running code from a local Jupyter Notebook Access to Qubole Spark clusters through a local Jupyter Notebook Shared access to data and visualizations from a Notebook 10

ACTIVATORS: DATA SCIENCE Data Science Activators provide access to the latest machine learning features and functionality within Qubole. Select one of these Activators to supercharge your data analytics. REF TITLE OVERVIEW OUTCOMES DSC01 Compile a Machine Learning Framework (non-gpu) Enable a single non-gpu machine learning framework on one cluster. Machine Learning code running in a Notebook on a Spark Cluster without GPU DSC02 Enable GPU Support for a Machine Learning Framework Enable GPU support for a single machine learning framework on a single cluster. Machine Learning code running in a Notebook on a Spark Cluster with GPU DSC03 Optimize Machine Learning Training Spark Application for Performance Optimize Machine Learning Training Spark Application for Performance. Tested and optimized Spark code (performance improvement not guaranteed) DSC04 Select and Train Machine Learning Models Validate existing machine learning model was properly selected and trained Recommend model or training procedures for selected Machine Learning Notebook DSC05 Migrate Local Machine Learning Model to a Qubole Spark Cluster Migrate a ML Model and provide best practices on ML in a distributed compute environment. Migrated local model into Qubole environment DSC06 Migrate Local Data Preparation to a Qubole Spark Cluster Familiarize with using Spark and distributed computing. Local data preparation process running in the cloud DSC07 Select and Configure Access to a Supplementary Dataset Select and configure access to an open source supplementary dataset to assist in a data science project. Access to open source dataset from within a Machine Learning model running in Qubole 11

ACTIVATORS: INTEGRATIONS Integration Activators offers expanded functionality for big data users by partnering with other solutions. Select any one of these Activators to speed up data access or run advanced data transformation and machine learning on data warehouses. REF TITLE OVERVIEW PARTNER OUTCOMES INT01 Integrate with Snowflake Use the Qubole UI, API, and notebooks to read and write data to the Snowflake data warehouse by using the Qubole Dataframe API for Apache Spark. Snowflake Configured Snowflake Data Store and tested Spark code utilizing Snowflake integration INT02 Integrate with Datadog Monitor, troubleshoot, and optimize QDS clusters with Datadog. Datadog Datadog access to Qubole cluster metrics INT03 Integrate with Tableau Deliver faster analytics, visualization and business intelligence leveraging Big Data stored in the cloud. INT04 Integrate with Looker Implement self-service data exploration and visualization without having to rely on a Data Science team. Looker makes the data in Qubole available to everyone across an organization regardless of technical ability for data exploration, visualization, or sharing. Tableau Looker Expanded analytic capabilities and Tableau access to data through a Qubole cluster Expanded analytic capabilities and Looker access to data through a Qubole cluster INT05 Integrate with Apache Kafka This Activator will assist with integrating Spark Qubole Cluster with Kafka. INT06 Integrate with Amazon Kinesis This Activator will assist with integrating Spark Qubole Cluster with Kinesis. - Real-time ingestion of data from Kafka. - Real-time ingestion of data from Kinesis. INT07 Integrate with Talend Learn how to use Talend Studio and Qubole to integrate data from various sources into the cloud data lake, and build the data quality workflows to clean, mask, and transform data as per the business requirements. Talend Talend configured to utilize Qubole clusters and access data 12

ACTIVATORS: MISCELLANEOUS Miscellaneous Activators cover a broad range of topics and are designed to cover our ad-hoc user needs. REF TITLE OVERVIEW OUTCOMES MSC01 Qubole Ask Me Anything (AMA) Gain access to Qubole subject matter experts over the course of four two-hour sessions. Four completed AMA sessions. 13

Ready to get started with the activation of your data. Please contact your account team or email us at sales@qubole.com to learn more and get started with Activators. Corporate Headquarters 469 El Camino Real Suite 205 Santa Clara, CA 95050 (855) 423-6674 www.qubole.com About Qubole Qubole is revolutionizing the way companies activate their data the process of putting data into active use across their organizations. With Qubole s cloud-native Big Data Activation Platform, companies exponentially activate petabytes of data faster, for everyone and any use case, while continuously lowering costs. Qubole overcomes the challenges of expanding users, use cases, and variety and volume of data while constrained by limited budgets and a global shortage of big data skills. Qubole s intelligent automation and self-service supercharge productivity, while workload-aware auto-scaling and real-time spot buying drive down compute costs dramatically. Qubole s platform delivers freedom of choice, eliminating legacy lock in use any engine and any tool to match your company s needs. 2018 Qubole version 1.1