Azure Data Lake Analytics Introduction for SQL Family. Julie

Similar documents
Azure Data Lake Store

BIG DATA COURSE CONTENT

Data Architectures in Azure for Analytics & Big Data

WPC010 Introduction to Azure Data Lake. Andrea Uggetti Microsoft Francesco Diaz Insight

Swimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad

Modern Data Warehouse The New Approach to Azure BI

microsoft

White Paper / Azure Data Platform: Ingest

Index. Scott Klein 2017 S. Klein, IoT Solutions in Microsoft s Azure IoT Suite, DOI /

HDInsight > Hadoop. October 12, 2017

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Oliver Engels & Tillmann Eitelberg. Big Data! Big Quality?

Alexander Klein. #SQLSatDenmark. ETL meets Azure

Processing Big Data. with AZURE DATA LAKE ANALYTICS. Sean Forgatch - Senior Consultant. 6/23/ TALAVANT. All Rights Reserved.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Přehled novinek v SQL Server 2016

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

Overview What is Data Lake Analytics? Get started Azure portal Visual Studio Azure PowerShell Azure CLI How to Manage Data Lake Analytics

Azure Data Factory VS. SSIS. Reza Rad, Consultant, RADACAD

17/05/2017. What we ll cover. Who is Greg? Why PaaS and SaaS? What we re not discussing: IaaS

Take P, R or U. and solve your data quality problems Oliver Engels & Tillmann Eitelberg, OH22

Overview of Data Services and Streaming Data Solution with Azure

Heute in der Suppenküche: Cognitive Services Allerlei

Boost your Analytics with ML for SQL Nerds

Simplifying your upgrade and consolidation to BW/4HANA. Pravin Gupta (Teklink International Inc.) Bhanu Gupta (Molex LLC)

Microsoft Perform Data Engineering on Microsoft Azure HDInsight.

Microsoft Exam

Understanding the latent value in all content

Enable IoT Solutions using Azure

Azure Data Factory. Data Integration in the Cloud

Modeling. Preparation. Operationalization. Profile Explore. Model Testing & Validation. Feature & Algorithm Selection. Transform Cleanse Denormalize

Stages of Data Processing

Developing Microsoft Azure Solutions (70-532) Syllabus

Exam Questions

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud

Lambda Architecture for Batch and Stream Processing. October 2018

BI ENVIRONMENT PLANNING GUIDE

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

COURSE 10977A: UPDATING YOUR SQL SERVER SKILLS TO MICROSOFT SQL SERVER 2014

Azure Webinar. Resilient Solutions March Sander van den Hoven Principal Technical Evangelist Microsoft

Oliver Engels & Tillmann Eitelberg. Big Data! Big Quality?

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

WHAT S NEW IN SQL SERVER 2016 REPORTING SERVICES?

CloudSwyft Learning-as-a-Service Course Catalog 2018 (Individual LaaS Course Catalog List)

The Cortana Intelligence Suite

##SQLSatMadrid. Project [Vélib by Cortana]

SQL Server SQL Server 2008 and 2008 R2. SQL Server SQL Server 2014 Currently supporting all versions July 9, 2019 July 9, 2024

Vishesh Oberoi Seth Reid Technical Evangelist, Microsoft Software Developer, Intergen

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Event Sponsors. Expo Sponsors. Expo Light Sponsors

Developing Microsoft Azure Solutions (70-532) Syllabus

Venezuela: Teléfonos: / Colombia: Teléfonos:

Franck Mercier. Technical Solution Professional Data + AI Azure Databricks

Big data streaming: Choices for high availability and disaster recovery on Microsoft Azure. By Arnab Ganguly DataCAT

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

MCSA SQL SERVER 2012

Data Lake Based Systems that Work

Data Analytics at Logitech Snowflake + Tableau = #Winning

Architecting Microsoft Azure Solutions (proposed exam 535)

Introduction to the Azure Portal

Build an open hybrid cloud and paint it red and blue

R Language for the SQL Server DBA

Implement a Data Warehouse with Microsoft SQL Server

#techsummitch

Developing Microsoft Azure Solutions (70-532) Syllabus

Microsoft Developer Day

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

20767B: IMPLEMENTING A SQL DATA WAREHOUSE

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP

Talend Big Data Sandbox. Big Data Insights Cookbook

From the Source to the Dashboard: SAP Agile Data Warehousing for Self-Service BI

Cortana Intelligence Suite; Where the Magic Happens

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

Security and Performance advances with Oracle Big Data SQL

Unifying Big Data Workloads in Apache Spark

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

Microsoft Implementing a SQL Data Warehouse

Implementing a SQL Data Warehouse

Alexander Klein. ETL in the Cloud

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Implementing a SQL Data Warehouse

New Features and Enhancements in Big Data Management 10.2

Modernizing Business Intelligence and Analytics

Data sources. Gartner, The State of Data Warehousing in 2012

Approaching the Petabyte Analytic Database: What I learned

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

20463C-Implementing a Data Warehouse with Microsoft SQL Server. Course Content. Course ID#: W 35 Hrs. Course Description: Audience Profile

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.

Agenda. Future Sessions: Azure VMs, Backup/DR Strategies, Azure Networking, Storage, How to move

Implementing a Data Warehouse with Microsoft SQL Server

70-532: Developing Microsoft Azure Solutions

Exam /Course 20767B: Implementing a SQL Data Warehouse

Developing Microsoft Azure Solutions: Course Agenda

Microsoft vision for a new era

Oskari Heikkinen. New capabilities of Azure Data Factory v2

Course Outline. Lesson 2, Azure Portals, describes the two current portals that are available for managing Azure subscriptions and services.

Hosted Azure for your business. Build virtual servers, deploy with flexibility, and reduce your hardware costs with a managed cloud solution.

Transcription:

Azure Data Lake Analytics Introduction for SQL Family Julie Koesmarno @MsSQLGirl www.mssqlgirl.com jukoesma@microsoft.com

What we have is a data glut Vernor Vinge (Emeritus Professor of Mathematics at San Diego State University)

The Data Lake Approach

CLOUD MOBILE

Traditional data warehousing approach Understand Corporate Strategy Gather Requirements Business Requirements Implement Data Warehouse Reporting & Analytics Design Reporting & Analytics Development BI and analytics Data warehouse Dimension Modelling Physical Design ETL Technical Requirements ETL Design ETL Development Data sources Setup Infrastructure Install and Tune

The Data Lake approach Ingest all data regardless of requirements Store all data in native format without schema definition Do analysis Using analytic engines like Hadoop Devices Batch queries Interactive queries Real-time analytics Machine Learning Data warehouse

How Microsoft has used Big Data MICROSOFT DOUBLES SEARCH SHARE We needed to better leverage data and analytics to win in search We changed our approach More experiments by more people! 25% 20% 15% 15% 16% 18% 19% 20% 10% 9% 11% So we Built an Exabyte-scale data lake for everyone to put their data. Built tools approachable by any developer. Built machine learning tools for collaborating across large experiment models. 5% 0% 2009 2010 2011 2012 2013 2014 2015 Source: ComScore 2009-2015 Search Report US

Introducing Azure Data Lake Big Data Made Easy

Azure Data Lake as part of Cortana Intelligence Suite Data Sources Information Management Big Data Stores Machine Learning and Analytics Intelligence People Data Factory SQL Data Warehouse Machine Learning Cognitive Services Data Catalog Data Lake Analytics Bot Framework Web Apps Event Hubs Data Lake Store HDInsight (Hadoop and Spark) Cortana Apps Mobile Stream Analytics Bots Dashboards & Visualizations Sensors and devices Power BI Automated Systems Data Intelligence Action

Azure Data Lake Analytics

Why ADLA? Use Cases Digital Crime Unit Analyze complex attack patterns to understand BotNets and to predict and mitigate future attacks by analyzing log records with complex custom algorithms Image Processing Large-scale image feature extraction and classification using custom code Shopping Recommendation Complex pattern analysis and prediction over shopping records using proprietary algorithms

Why ADLA? ADLA Enables customers to leverage existing experience with C#, SQL & PowerShell Offers convenience, efficiency, automatic scale, and management in a job service form factor

14 Azure Data Lake Analytics Start in seconds Scale instantly Pay per job Develop massively parallel programs with simplicity Debug and optimize your Big Data programs with ease Virtualize your analytics Enterprise-grade security, auditing and support 0100101001000101010100101001000 10101010010100100010101010010100 10001010101001010010001010101001 0100100010101010010100100010101 0100101001000101010100101001000 10101010010100100010101010010100 10001010101001010010001010101001 0100100010101010010100100010101 0100101001000101010100101001000 10101010010100100010101010010100

ADL and SQL / Power BI

Work across all cloud data Azure Data Lake Analytics Azure SQL DW Azure SQL DB Azure Data Lake Store Azure Storage Blobs SQL DB in an Azure VM

Tools

Get started 1 2 3 4 Log in to Azure Create an ADLA account Write and submit an ADLA job with U-SQL The job reads and writes data from storage 30 seconds ADLS Azure Blobs Azure DB

https://github.com/azure/usql

20 What can you do in the Azure Portal? Create a new Data Lake Analytics account Author U-SQL scripts Submit U-SQL jobs Cancel running jobs Provision users who can submit jobs Visualize usage stats (compute hours) Visualize job management chart

ADLA billing https://blogs.msdn.microsoft.com/azuredatalake/2016/10/12/understanding-adl-analytics-unit/ Accounts are FREE! Pay for the compute resources you want for your queries Pay for storage separately (query_hours * parallelism) * price/hour USAGE GA PRICE (STARTING JANUARY 1 ST, 2017)* ADLAU $2 / hour Completed Job Free *special monthly commitment discounted pricing available

ADLAU allocation Example: allocating 10 ADLAUs for a 10 minute job Cost: 10 min * 10 ADLAUs = 100 ADLAU minutes Blue line: allocated Red line: running Over-allocation Under-allocation Time Consider using fewer ADLAUs You are paying for the area under the blue line You are only using the area under the red line Time Consider using more ADLAUs

23 What can you do with Visual Studio? Author U-SQL scripts (with C# code) Debug U-SQL and C# code Submit and cancel U-SQL Jobs Visualize physical plan of U-SQL query Visualize and replay progress of job Fine-tune query performance Create metadata objects Browse metadata catalog

25 How to get going with ADL Tools for Visual Studio Plug-in

Metadata objects ADL Analytics creates and stores a set of metadata objects in a catalog maintained by a metadata service Tables and TVFs are created by DDL statements (CREATE TABLE ) Metadata objects can be created directly through the Server Explorer Azure Data Lake Analytics account Databases Tables Table valued functions Jobs Schemas Linked storage 26

27 Metadata catalog The metadata catalog can be browsed with the Visual Studio Server Explorer Server Explorer lets you: 1. Create new tables, schemas and databases 2. Register assemblies

Meta Data Object Model Credentials Data Source ADLA Account/Catalog [1,n] Database [1,n] Schema C# Fns C# UDTs C# UDAgg C# Assemblies C# Extractors C# Reducers C# Processors C# Applier C# Combiners C# Outputters [0,n] Ext. tables tables views TVFs Procedures Table Types Statistics Clustered Index Legend partitions User objects Contains MD Name Refers to C# Name Implemented and named by

I U-SQL

Status Quo: SQL for Big Data Declarativity does scaling and parallelization for you Extensibility is bolted on and not native hard to work with anything other than structured data difficult to extend with custom code

Status Quo: Programming Languages for Big Data Extensibility through custom code is native Declarativity is bolted on and not native User often has to care about scale and performance SQL is 2 nd class within string Often no code reuse / sharing across queries

Declarativity and Extensibility are equally native to the language! Why U-SQL Get benefits of both! Makes it easy for you by unifying: Unstructured and structured data processing Declarative SQL and custom imperative Code (C#) Local and remote Queries Increase productivity and agility from Day 1 and at Day 100 for YOU!

The Origins of U-SQL Next generation large-scale data processing language combining U-SQL SCOPE The declarative, optimizable and parallelizability of SQL The extensibility, expressiveness and familiarity of C# T-SQL Hive High performance Scalable Affordable Easy to program Secure 33

Query data where it lives Easily query data in multiple Azure data stores without moving it to a single store Benefits Avoid moving large amounts of data across the network between stores Single view of data irrespective of physical location Minimize data proliferation issues caused by maintaining multiple copies Single query language for all data Each data store maintains its own sovereignty Design choices based on the need Push SQL expressions to remote SQL sources Projections Filters Joins U-SQL Query Azure Data Lake Storage Azure Data Lake Analytics Query Azure Storage Blobs Azure SQL Data Warehouse Azure SQL DB Azure SQL in VMs

U-SQL Language Philosophy Declarative Query and Transformation Language: Uses SQL s SELECT FROM WHERE with GROUP BY/Aggregation, Joins, SQL Analytics functions Optimizable, Scalable Expression-flow programming style: Easy to use functional lambda composition Composable, globally optimizable Operates on Unstructured & Structured Data Schema on read over files Relational metadata objects (e.g. database, table) Extensible from ground up: Type system is based on C# Expression language IS C# User-defined functions (U-SQL and C#) User-defined Aggregators (C#) User-defined Operators (UDO) (C#) REFERENCE MyDB.MyAssembly; CREATE TABLE T( cid int, first_order DateTime, last_order DateTime, order_count int, order_amount float,... ); @o = EXTRACT oid int, cid int, odate DateTime, amount float FROM "/input/orders.txt" USING Extractors.Csv(); @c = EXTRACT cid int, name string, city string FROM "/input/customers.txt" USING Extractors.Csv(); @j = SELECT c.cid, MIN(o.odate) AS firstorder, MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt, AGG<MyAgg.MySum>(c.amount) AS totalamount FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid WHERE c.city.startswith("new") && MyNamespace.MyFunction(o.odate) > 10 GROUP BY c.cid; OUTPUT @j TO "/output/result.txt" USING new MyData.Write(); INSERT INTO T SELECT * FROM @j; U-SQL provides the Parallelization and Scale-out Framework for Usercode EXTRACTOR, OUTPUTTER, PROCESSOR, REDUCER, COMBINER, APPLIER Federated query across distributed data sources

U-SQL compilation process Compilation output (in job folder) U-SQL metadata service C# managed dll C++ unmanaged dll compiler & optimizer algebra deployed to vertices other files (system files, deployed resources)

Query Life Visual Studio Front-End Service Job Scheduler & Queue Compiler Optimizer Runtime Vertex Scheduling Portal / API

Jobs States, queue, priority

39 Job execution graph After a job is submitted the progress of the execution of the job as it goes through the different stages is shown and updated continuously Important stats about the job are also displayed and updated continuously

Visual Studio: Job states UX Preparing Job State New Compiling The script is being compiled by the Compiler Service Queued Queued All jobs enter the queue. Scheduling Are there enough ADLAUs to start the job? Starting If yes, then allocate those ADLAUs for the job Running Finalizing Ended (Succeeded, Failed, Cancelled) Running The U-SQL runtime is now executing the code on 1 or more ADLAUs or finalizing the outputs Ended The job has concluded.

Why does a job get queued? Local cause Possible condition: Not enough containers available to your account Global cause (very rare) Possible conditions: System-wide shortage of containers System-wide shortage of bandwidth

ADLA & U-SQL Summary

This is why ADL & U-SQL! Easily processes unknown value big data Unifies natively SQL s declarativity and C# s extensibility Enterprise grade security & auditing support Increases productivity and agility from Day 1 forward for YOU! Sign up for an Azure Data Lake account http://www.azure.com/datalake and give us your feedback via http://aka.ms/adlfeedback or at http://aka.ms/u-sql-survey!

Additional Resources Blogs and community page: http://usql.io (U-SQL Github) http://blogs.msdn.microsoft.com/mrys/ http://blogs.msdn.microsoft.com/azuredatalake/ https://channel9.msdn.com/search?term=u- SQL#ch9Search Documentation and articles: http://aka.ms/usql_reference https://azure.microsoft.com/enus/documentation/services/data-lake-analytics/ https://msdn.microsoft.com/en-us/magazine/mt614251 ADL forums and feedback http://aka.ms/adlfeedback https://social.msdn.microsoft.com/forums/azure/en- US/home?forum=AzureDataLake http://stackoverflow.com/questions/tagged/u-sql

Get started today! For more information visit: http://azure.com/datalake 46

Thank You Redmond!