Modern Data Warehouse The New Approach to Azure BI

Similar documents
Data Architectures in Azure for Analytics & Big Data

Overview of Data Services and Streaming Data Solution with Azure

Azure Data Factory. Data Integration in the Cloud

BI ENVIRONMENT PLANNING GUIDE

Franck Mercier. Technical Solution Professional Data + AI Azure Databricks

Microsoft Analytics Platform System (APS)

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

White Paper / Azure Data Platform: Ingest

Microsoft Developer Day

Stages of Data Processing

Přehled novinek v SQL Server 2016

BIG DATA COURSE CONTENT

Understanding the latent value in all content

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET

Oliver Engels & Tillmann Eitelberg. Big Data! Big Quality?

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Azure Data Lake Store

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

Ian Choy. Technology Solutions Professional

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP

Migrating Enterprise BI to Azure

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.

SQL Server Pre Lanzamiento. Federico Marty. Mariano Kovo. Especialista en Plataforma de Aplicaciones Microsoft Argentina & Uruguay

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

SQL 2016 Performance, Analytics and Enhanced Availability. Tom Pizzato

Top Five Reasons for Data Warehouse Modernization Philip Russom

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Oliver Engels & Tillmann Eitelberg. Big Data! Big Quality?

SQL Server Everything built-in

DATA SCIENCE USING SPARK: AN INTRODUCTION

HDInsight > Hadoop. October 12, 2017

One is the Loneliest Number: Scaling out your Data Warehouse

COURSE 10977A: UPDATING YOUR SQL SERVER SKILLS TO MICROSOFT SQL SERVER 2014

SQL Server SQL Server 2008 and 2008 R2. SQL Server SQL Server 2014 Currently supporting all versions July 9, 2019 July 9, 2024

Oskari Heikkinen. New capabilities of Azure Data Factory v2

Migrate from Netezza Workload Migration

SQL Server 2017 Power your entire data estate from on-premises to cloud

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

Lambda Architecture for Batch and Stream Processing. October 2018

SQL Server Evolution. SQL 2016 new innovations. Trond Brande

Azure SQL Data Warehouse. Andrija Marcic Microsoft

Designing a Modern Data Warehouse + Data Lake

microsoft

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine

Migrate from Netezza Workload Migration

What is Gluent? The Gluent Data Platform

Alexander Klein. #SQLSatDenmark. ETL meets Azure

WHITE PAPER: TOP 10 CAPABILITIES TO LOOK FOR IN A DATA CATALOG

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

Apache Hadoop 3. Balazs Gaspar Sales Engineer CEE & CIS Cloudera, Inc. All rights reserved.

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud

VOLTDB + HP VERTICA. page

The age of Big Data Big Data for Oracle Database Professionals

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

CloudSwyft Learning-as-a-Service Course Catalog 2018 (Individual LaaS Course Catalog List)

Virtuoso Infotech Pvt. Ltd.

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Automated Netezza to Cloud Migration

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

Microsoft Exam

Automated Netezza Migration to Big Data Open Source

Rickard Linck Client Technical Professional Core Database and Lifecycle Management Common Analytic Engine Cloud Data Servers On-Premise Data Servers

WHAT S NEW IN SQL SERVER 2016 REPORTING SERVICES?

Data Warehouse Design Decisions

Tour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect

17/05/2017. What we ll cover. Who is Greg? Why PaaS and SaaS? What we re not discussing: IaaS

April Copyright 2013 Cloudera Inc. All rights reserved.

Databricks, an Introduction

Take P, R or U. and solve your data quality problems Oliver Engels & Tillmann Eitelberg, OH22

Data Lake Based Systems that Work

An Introduction to Big Data Formats

Capture Business Opportunities from Systems of Record and Systems of Innovation

IBM dashdb Local. Using a software-defined environment in a private cloud to enable hybrid data warehousing. Evolving the data warehouse

Microsoft vision for a new era

Azure Data Factory VS. SSIS. Reza Rad, Consultant, RADACAD

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Processing Big Data. with AZURE DATA LAKE ANALYTICS. Sean Forgatch - Senior Consultant. 6/23/ TALAVANT. All Rights Reserved.

Unifying Big Data Workloads in Apache Spark

5/24/ MVP SQL Server: Architecture since 2010 MCT since 2001 Consultant and trainer since 1992

New Features and Enhancements in Big Data Management 10.2

Modernizing Business Intelligence and Analytics

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC

WHITEPAPER. MemSQL Enterprise Feature List

MapR Enterprise Hadoop

Approaching the Petabyte Analytic Database: What I learned

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

The Evolution of Big Data Platforms and Data Science

BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST. Copyright 2016 EMC Corporation. All rights reserved.

Microsoft Power BI for O365

Transform your data estate with cloud, data and AI

Drawing the Big Picture

Cloud Analytics and Business Intelligence on AWS

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Making Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0. WEBINAR MAY 15 th, PM EST 10AM PST

Microsoft certified solutions associate

Data-Intensive Distributed Computing

"Charting the Course... MOC B Updating Your SQL Server Skills to Microsoft SQL Server 2014 Course Summary

Transcription:

Modern Data Warehouse The New Approach to Azure BI

History

On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform

On-Premise SQL Server Big Data Solutions Modern Analytics Platform

What is a modern data warehouse? Source: Russom, P. (2013) The Modern Data Warehouse: What Enterprises Must Have Today and What They ll Need in the Future, TWDI

Data Analysis Paradigm Shift OLD WAY: Structure -> Ingest -> Analyze NEW WAY: Ingest -> Analyze -> Structure This solves the two biggest reasons why many EDW projects fail: Too much time spent modeling when you don t know all of the questions your data needs to answer Wasted time spent on ETL where the net effect is a star schema that doesn t actually show value

Data lake is the center of a big data solution A storage repository, that holds a vast amount of raw data in its native format until it is needed. Inexpensively store unlimited data Collect all data just in case Store data with no modeling Schema on read Complements EDW Frees up expensive EDW resources Quick user access to data ETL Hadoop tools Easily scalable Active archive (federated queries) Data Science workspaces Areas of curated data Supports structured, semi-structured and unstructured data

Data Lake layers Raw data layer Raw events are stored for historical reference. Also called staging layer or landing area Cleansed data layer Raw events are transformed (cleaned and mastered) into directly consumable data sets. Aim is to uniform the way files are stored in terms of encoding, format, data types and content (i.e. strings). Also called conformed layer Application data layer Business logic is applied to the cleansed data to produce data ready to be consumed by applications (i.e. DW application, advanced analysis process, etc). This is also called by a lot of other names: workspace, trusted, gold, secure, production ready, governed Sandbox data layer Optional layer to be used to play in. Also called exploration layer or data science workspace Still need data governance so your data lake does not turn into a data swamp!

Data platform continuum Shared lower cost On-premises Hybrid cloud Off-premises Dedicated higher cost Higher administration Lower administration

SMP vs MPP SMP - Symmetric Multiprocessing Multiple CPUs used to complete individual processes simultaneously All CPUs share the same memory, disks, and network controllers (scale-up) All SQL Server implementations up until now have been SMP Mostly, the solution is housed on a shared SAN MPP - Massively Parallel Processing Uses many separate CPUs running in parallel to execute a single program Shared Nothing: Each CPU has its own memory and disk (scale-out) Segments communicate using high-speed network between nodes

On-premises Cloud Microsoft SMP options On-premises SMP (Data Warehouse Fast Track or custom) Full SQL Server surface area. Known, deployed, owned by customer. 5TB to145+ TB compute; 5TB to 1.2 PB+ storage. Relational Azure SQL Data Warehouse SQL Server in Azure VMs SQL Server 2016 Fast Track for Azure VMs Beyond relational Azure Data Lake Azure HDInsight Azure Marketplace Cloud SMP (SQL Server 2016 Fast Track for Azure VMs) Full SQL Server surface area. PolyBase Insights Known, deployed by customer, hosted by Microsoft. Certified VM sizes include GS5 (32 cores, 448GB memory, 64TB). Certified to 16 TB storage. Integrate with non-relational data SQL Server 2016 Data Warehouse Fast Track Analytics Platform System Third-party Hadoop distributions Hadoop, Cloudera, Hortonworks, Map R. Language translation: SQL Server 2016 PolyBase. Flexibility

Options to store and process data

Control Node Interacts with apps & connections; coordinates activities of the compute nodes. Compute Nodes Provide the computational engines to process data. Distributions Every row of data is stored in a distribution. The method of distributing data is critical to achieving good performance. MPP Architecture

PolyBase Query relational and non-relational data with T-SQL PolyBase is interactive while U-SQL is batch. PolyBase extents T-SQL onto data via views while U-SQL natively operates on data and virtualizes access to other SQL data sources (no metadata needed) and supports more formats (JSON) and libraries/udos

When to consider a Virtual Machine Consider when you want to: Closely resemble a traditional DW implementation Run an SMP DB larger than Azure SQL DB supports Quickly migrate an existing solution to the cloud Run the software or DB platform of your choice with full feature parity Run all aspects of SQL Server (SSIS, SSAS MD, MDS) Have full control & administer all aspects

When to consider a SQL DB Consider when you want to: Create a new DW solution Run a small to medium-sized DW workload (up to 4TB currently) Take advantage of PaaS & reduced administration effort Optionally utilize automatic tuning features

When to consider a Azure SQL DW Consider when you want to: Run a large-size DW solution (1-4TB+) Scale up/down, or pause, based on demand Integrate with multistructured data

BIG DATA STORAGE Reduced Administration BIG DATA ANALYTICS K N O W I N G T H E V A R I O U S B I G D A T A S O L U T I O N S CONTROL EASE OF USE Azure Databricks Azure Data Lake Analytics Azure HDInsight Azure Marketplace HDP CDH MapR Any Hadoop technology, any distribution Workload optimized, managed clusters Frictionless & Optimized Spark clusters Data Engineering in a Job-as-a-service model IaaS Clusters Managed Clusters Big Data as-a-service Azure Data Lake Analytics Azure Data Lake Store Azure Storage

A Z U R E D A T A B R I C K S Azure Databricks Collaborative Workspace IoT / streaming data DATA ENGINEER DATA SCIENTIST BUSINESS ANALYST Machine learning models Cloud storage Deploy Production Jobs & Workflows BI tools MULTI-STAGE PIPELINES JOB SCHEDULER NOTIFICATION & LOGS Data warehouses Optimized Databricks Runtime Engine Data exports Hadoop storage DATABRICKS I/O APACHE SPARK SERVERLESS Rest APIs Data warehouses Enhance Productivity Build on secure & trusted cloud Scale without limits

Evolving to a Modern Data Warehouse

Realise business value from the data

Common Data Service for Analytics

CDS for Analytics Resources and Video Links https://powerbi.microsoft.com/en-us/blog/coming-soon-to-power-bicommon-data-service-for-analytics/ https://www.youtube.com/watch?v=xaa5c1bowpe https://www.youtube.com/watch?v=1vq0hlnz06a

Resources https://azure.microsoft.com/en-us/blog/technical-reference-implementation-for-enterprise-bi-andreporting/ https://www.sqlchick.com/entries/2017/1/9/defining-the-components-of-a-modern-data-warehouse-aglossary http://www.jamesserra.com/archive/2014/12/the-modern-data-warehouse/ https://skylandtech.net/2014/09/22/a-modern-data-warehouse-architecture-part-1-add-a-data-lake/

Thank you