Tomasz Libera. Azure SQL Data Warehouse

Similar documents
Azure SQL Data Warehouse. Andrija Marcic Microsoft

L300 deck. Paul Duffett Mar2017

Non-relational Lift and Shift. Cheap, flexible Access Customer managed 250GB PB+

Sepand Gojgini. ColumnStore Index Primer

Survey of the Azure Data Landscape. Ike Ellis

Modern Data Warehouse The New Approach to Azure BI

Get the Skinny on Minimally Logged Operations

Swimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad

Microsoft Analytics Platform System (APS)

Polybase In Action. Kevin Feasel Engineering Manager, Predictive Analytics ChannelAdvisor #ITDEVCONNECTIONS ITDEVCONNECTIONS.COM

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

BI ENVIRONMENT PLANNING GUIDE

Fast, In-Memory Analytics on PPDM. Calgary 2016

Things I Learned The Hard Way About Azure Data Platform Services So You Don t Have To -Meagan Longoria

Tour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect

White Paper / Azure Data Platform: Ingest


CAST(HASHBYTES('SHA2_256',(dbo.MULTI_HASH_FNC( tblname', schemaname'))) AS VARBINARY(32));

Přehled novinek v SQL Server 2016

Eternal Story on Temporary Objects

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

Designing Database Solutions for Microsoft SQL Server (465)

SQL Server 2014 Column Store Indexes. Vivek Sanil Microsoft Sr. Premier Field Engineer

microsoft

Automating Information Lifecycle Management with

Columnstore in real life

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

6232B: Implementing a Microsoft SQL Server 2008 R2 Database

Columnstore Technology Improvements in SQL Server Presented by Niko Neugebauer Moderated by Nagaraj Venkatesan

SQL Server SQL Server 2008 and 2008 R2. SQL Server SQL Server 2014 Currently supporting all versions July 9, 2019 July 9, 2024

SQL Server 2016 gives 40% improved performance over SQL Server 2014

Seven Awesome SQL Server Features

DESIGNING DATABASE SOLUTIONS FOR MICROSOFT SQL SERVER CERTIFICATION QUESTIONS AND STUDY GUIDE

Exam Questions

Azure Data Factory VS. SSIS. Reza Rad, Consultant, RADACAD

SQL Server 2014 Internals and Query Tuning

Successfully migrate existing databases to Azure SQL Database. John Sterrett Principal Consultant

Data Architectures in Azure for Analytics & Big Data

Course Outline. Upgrading Your Skills to SQL Server 2016 Course 10986A: 3 days Instructor Led

Martin Cairney. The Why and How of Partitioned Tables

ETL Best Practices and Techniques. Marc Beacom, Managing Partner, Datalere

This course is suitable for delegates working with all versions of SQL Server from SQL Server 2008 through to SQL Server 2016.

Martin Cairney SPLIT, MERGE & ELIMINATE. SQL Saturday #572 : Oregon : 22 nd October, 2016

Microsoft. [MS20762]: Developing SQL Databases

VOLTDB + HP VERTICA. page

Autonomous Database Level 100

Niraj Kumar Lead Azure Architect, MCT( Microsoft Certified Trainer)

One is the Loneliest Number: Scaling out your Data Warehouse

Developing SQL Databases

Oracle 1Z0-515 Exam Questions & Answers

SQL Server in Azure. Marek Chmel. Microsoft MVP: Data Platform Microsoft MCSE: Data Management & Analytics Certified Ethical Hacker

MCSE Data Management and Analytics. A Success Guide to Prepare- Developing Microsoft SQL Server Databases. edusum.com

Updating Your Skills to SQL Server 2016

BIG DATA COURSE CONTENT

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Alexander Klein. #SQLSatDenmark. ETL meets Azure

20762B: DEVELOPING SQL DATABASES

Course Prerequisites: This course requires that you meet the following prerequisites:

HDInsight > Hadoop. October 12, 2017

Chapter 1: Introducing SQL Server

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

Azure SQL Database Training. Complete Practical & Real-time Trainings. A Unit of SequelGate Innovative Technologies Pvt. Ltd.

Jyotheswar Kuricheti

Lenovo Database Configuration for Microsoft SQL Server TB

Microsoft Exam

SQL Server Development 20762: Developing SQL Databases in Microsoft SQL Server Upcoming Dates. Course Description.

Martin Cairney. Hybrid data platform making the most of Azure plus your onprem

ColumnStore Indexes UNIQUE and NOT DULL

SQL Server Databases in the Clouds

SQL Server on Linux and Containers

SQL Server 2014 In-Memory Technologies.

Troubleshooting Always On Availability Groups Performance

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

SQL Server Evolution. SQL 2016 new innovations. Trond Brande

Big Data solution benchmark

Azure Data Lake Store

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Stages of Data Processing

IT Best Practices Audit TCS offers a wide range of IT Best Practices Audit content covering 15 subjects and over 2200 topics, including:

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Netezza The Analytics Appliance

Venezuela: Teléfonos: / Colombia: Teléfonos:

Microsoft Developing SQL Databases

ColumnStore Indexes. מה חדש ב- 2014?SQL Server.

Change Schema Of Tables Procedures In Sql. Server 2008 >>>CLICK HERE<<<

Azure Data Factory. Data Integration in the Cloud

Greenplum Architecture Class Outline

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

Synergetics-Standard-SQL Server 2012-DBA-7 day Contents

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

Get ready to be what s next.

MySQL Database Administrator Training NIIT, Gurgaon India 31 August-10 September 2015

SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less

New features in SQL Server 2016

Data Warehouse in the Cloud Benchmark

Developing SQL Databases (762)

BI4Dynamics AX/NAV Integrate external data sources

New Features in SQL That Will Change the Way You Tune

Exam /Course 20767B: Implementing a SQL Data Warehouse

Exadata Implementation Strategy

Transcription:

Tomasz Libera Azure SQL Data Warehouse

Thanks to our partners!

About me Microsoft MVP Data Platform Microsoft Certified Trainer SQL Server Developer Academic Trainer datacommunity.org.pl One of the leaders of the Data Community Poland Organizer of conferences: SQLDay, SQLSaturday Interests Mountain biking MTB marathons tomasz.libera@datacommunity.pl blog.libera.net.pl

Agenda Introduction MPP DWU DISTRIBUTIONS Gen2 Creating tables and differences in TSQL CTAS DISTRIBUTION METHODS INDEXES STATS Monitor and tune JOIN QUERY PERFORMANCE Data Loading BCP AZCOPY POLYBASE SSIS

Introduction

MPP Architecture MPP - Massively parallel processing PDW - Parallel Data Warehouse -> APS - Analytics Platform System Main control node + 60 x compute nodes Distributed queries all queries are distributed across nodes Data Movement Service internal service that moves data across the nodes as necessary to run queries in parallel and return accurate results TSQL tables, views, stored procedures, temporary tables, variables Columnstore architecture default index type for all tables Microsoft Polybase enables to query data from Hadoop and Blob Storage using TSQL

Dynamically scale Pause and resume compute to save cost You will be charged for storage Scale compute in few minutes Service level: Data Warehouse Unit (DWU) units of compute scale CPU Memory IO operations DW100 1,28 EUR/h DW6000 76,51 EUR/h Pause/Resume Scale Azure portal Yes Yes PowerShell Yes Yes REST API Yes Yes T-SQL No Yes

GEN 2 GEN 1 DW Capacity limits DW100 DW400 DW1000 DW6000 Compute Nodes 1 4 10 60 Distibutions/ ComputeNode 60 15 6 1 Memory per data warehouse (GB) 24 96 240 1440 Price /hour 1,28 5,10 12,75 76,51 DW500c DW2000c DW5000c DW30000c Compute Nodes 1 4 10 60 Distibutions/ ComputeNode 60 15 6 1 Memory per data warehouse (GB) 300 1200 3000 18000 Price /hour 6,38 25,50 63,75 382,52

Distributed tables Every single row in table is assigned to distribution (distributed storage location) Distributions are grouped into compute nodes Number of distributions in SDW in static - 60 (every DWU) Higher DWU = more compute nodes = less compueted nodes addigned to one distribution. DW6000/DW30000c: 1 compute node = 1 distribution

Creating databases Azure Portal TSQL PowerShell Database name unique within SQL Server that hosts Azure SQL Database and SQL Data Warehouse Collation Windows/ SQL collation, default: SQL_Latin1_General_CP1_CI_AS Maximum database size: 250 GB - 240 TB, by default 10 TB Gen2: Unlimited columnar storage Edition datawarehouse (the only option for ADW)

Gen2 5x query performance via a adaptive caching technology, NVMe Solid State Disk cache keeps the most frequently accessed data close to the CPUs (compute node) NVMe SSD = 3GB/sec throughput, 0,02 ms latency SATA SSD = 500MB/sec, 0,2 ms Improvement in serving concurrent queries (32 to 128 queries/cluster) Amazon Redshift maximum concurrent queries: 50 Unlimited columnar storage Offers the greatest level of scale by enabling you to scale up to 30,000 Data Warehouse Units (Gen1 DW6000) Microsoft recommends migrate to Gen2 SQL Data Warehouse

DEMO 1 Create database PDW_SHOWSPACEUSED() CTAS Demo10 CreateDB.sql

Tables and differences in TSQL

Create Table As Select (CTAS) Fully parallelized operation Creates a new table based on the output of SELECT statement CTAS is the simplest and fastest way to create a copy of a table Use CTAS to: Re-create a table with a different hash distribution column. Re-create a table as replicated. Create a columnstore index on just some of the columns in the table. Query or import external data Use partitioning

Distributed method Each database is divided into 60 distributions, using one of 3 methods; ROUND ROBIN (default) distributes evenly, but randomly, doesn t require knowledge about data/ queries HASH DISTRIBUTION Distributed using hash algorithm, equal values to same distribution, optimal for large tables REPLICATED TABLES All data present on every node, simplifies many queries plans and reduces data movement, best for small lookup tables

Distributed method ROUND ROBIN (default) The assignment of rows to distributions is random rows with equal values are not assigned to the same distribution When to use no obvious joining key not good candidate column for hash distributing the table table does not share a common join key with other tables join is less significant than other joins in the query table is a temporary staging table

Distributed method HASH DISTRIBUTION Distributes table rows across the Compute nodes by using a deterministic hash function. Hash column static, many unique values When to use: table size on disk is more than 2 GB

Distributed method REPLICATED TABLES All data present on every node, simplifies many queries plans and reduces data movement, best for small lookup tables. A table that is replicated caches a full copy of the table on each compute node. Consequently, replicating a table removes the need to transfer data among compute nodes before a join or aggregation.

Indexes clustered columnstore index (default) clustered index (rowstore, for more selective queries) nonclustered index (rowstore) heap (faster row insert) When NOT to use default, columnstore: Unsoported data types; varchar/nvarchar(max) Temporary tables Small tables (< 100 mln rows)

Statistics Statistical information about the distribution of values in table/ index Created on one or more columns SQL Data Warehouse supports auto create statistics (May 2018)

Temporary tables Rows are visible only in current session Dropped when user logged-out SQLDW and SQL Server temp tables Similarities Temp Table created BEFORE execution of dtored proc is visible within procedure Differences Temp table created within stored proc is visible after proc execution Usually created by CTAS statement Can be indexed (heap, columnstore clustered, clustered)

Partitioning Improve query performance Speed up loading and archiving of data (partition switching) maintenance operations on individual partitions instead of the whole table Remind, that in SQLDW all tables are already divided into 60 databases simplier syntax than SQL Server, no partition schema/ partition function

Not supported Identity (available from June 2017) Sequences Primary Key, Foreign Key, Unique, Check Unique indexes Computed columns Sparse columns User definied data types Triggers Indexed views Synonims

IDENTITY IDENTITY column property supported since June 2017 Not supported: @@IDENTITY, SCOPE_IDENTITY functions Hash-distribution where the column is also the distribution key Where the table is an external table Doesn't guarantee the order in which the surrogate values are allocated Supported: IDENTITY_INSERT ON

DEMO 2 Demo20 Create tables.sql Demo21 Identity.sql Demo22 Indexes.sql Demo23 Statistics.sql Demo24 Temp tables.sql Demo25 Partitioning.sql Distributions - Round_robin - hash - replicate Identity Indexes Statistics Temporary tables Partitioning

Monitor and tune

LABEL Query hint to assign a comment to query Simplifies monitoring process Easy to find query in sys.dm_pdw_exec_requests DMV Use brackets when querying the label column, as it a key word

sys.dm_pdw_exec_requests Contain last 10K executed queries One row = one request/ query Status - 'Running', 'Suspended', 'Completed', 'Cancelled', 'Failed'. Resource_class - pre-determined resource limits to govern compute resources and concurrency for query execution Resource classes are implemented as pre-definied database roles.

sys.dm_pdw_request_steps All steps that compose a given request or query One row = query step Operation_type: DMS query plan operations (selected) SQL query plan operations ('OnOperation', 'RemoteOperation') Other query plan operations

Data Movement Service DMS - data transport technology that coordinates data movement between the Compute nodes. Some queries require data movement to ensure the parallel queries return accurate results. When data movement is required, DMS ensures the right data gets to the right location.

DISTRIBUTION COLUMN DISTRIBUTION COLUMN Data Movement between nodes NODE 1 ProductKey OrderDateKey CustomerKey SalesAmount 488 20181012 24604 53,99 CustomerKey Firstname Lastname 15460 Victoria Cooper CustomerKey Firstname Lastname 24604 Danny Travers NODE 2 371 20181019 15460 2181,5625 18125 Eduardo Turner 15460 Victoria Cooper NODE 3 381 20181021 18125 1000,4375 11264 Isabella Allen 18125 Eduardo Turner NODE 4 228 20181022 11264 49,99 24604 Danny Travers 11264 Isabella Allen

DEMO 3 Monitor and tune sys.dm_pdw_exec_requests sys.dm_pdw_request_steps join query performance - Hash - Round_robin - Replicate

Data loading

Data loading BCP Export to text file from SQL Server, import from text to SQL Data Warehouse PolyBase (External tables) 1. bcp export to flat file 2. AZCopy 3. Polybase SSIS ADO NET/ OLE DB Source Destination Azure Blob Upload Task Azure SQL DW Upload Task Redgate Data Platform Studio Other (Azure Lake Data Store, Azure Data Factory)

bcp Export/ import process: From text file to SQL Data Warehouse From SQL Data Warehouse to text file

Polybase Microsoft Polybase enables to query data from Hadoop and Blob Storage using TSQL With PolyBase, the data loads in parallel from the data source directly to the compute nodes The best (and fastest) method to load data into SQL Data Warahouse Data should be first loaded into Azure Blob Storage The higher DWU, the faster import https://docs.microsoft.com/en-us/azure/sql-data-warehouse/design-elt-data-loading

Polybase step by step 1. Bcp/SSIS export from SQL Server to text file 2. AZCOPY copying data to Azure Blob Storage 3. Access to Azure Blob Storage based on DATABASE SCOPED CREDENTIAL 4. Data source referencing credentials from previous step EXTERNAL DATA SOURCE 5. File format and external table table definitione EXTERNAL FILE FORMAT EXTERNAL TABLE 6. Load data into new table using CTAS statement

SQL Server Integration Services Feature Pack for Azure Azure Blob Upload Task replaces AZCopy

SQL Server Integration Services Feature Pack for Azure Azure SQL DW Upload Task Load data in text file to Blob Storage, and using Polybase integration to table in SDW

DEMO 4 Bcp Polybase SSIS Demo40 BCP.sql Demo41 Polybase.sql Demo42 SSIS.sql

THANK YOU! tomasz.libera@datacommunity.pl @tomasz_libera Slides, demos: http://bit.ly/sqlsatbanialuka_asdw

Thanks to our partners!