Data Architectures in for Analytics & Big Data October 20, 2018 Melissa Coates Solution Architect, BlueGranite Microsoft Data Platform MVP Blog: www.sqlchick.com Twitter: @sqlchick
Data Architecture A set of rules, policies, standards, & models that govern and define the type of data collected & how it is used, stored, managed, & integrated within an organization & its database systems Source: Technopedia
Data Architecture Components Source: https://docs.microsoft.com/en-us/azure/architecture/data-guide/big-data/
Source: http://azureplatform.azurewebsites.net/ A public cloud computing platform and infrastructure for building, deploying, and managing MSFTspecific and third party software and services through a global network of Microsoftmanaged datacenters
Technologies
Data Storage: Relational Databases IaaS Infrastructure as a Service SMP Symmetric Multi-Processing PaaS Platform as a Service MPP Massively Parallel Processing Relational database of your choice in a virtual machine SQL Database Managed Database Managed Instance SQL Data Warehouse MPP Massively Parallel Processing Database for MySQL Database for PostgreSQL Database for MariaDB
Data Storage: Big Data PaaS Platform as a Service Object Storage (Flat) Hierarchical Storage Blob Storage Data Lake Store (Gen1) Multi-Modal Data Lake Storage (Gen2) (PREVIEW)
Data Storage: NoSQL PaaS Platform as a Service Multi-Model HDInsight HBase Cluster Table Storage CosmosDB Key Value Column Family JSON Documents Graph
Data Storage: Analytical & OLAP IaaS Infrastructure as a Service PaaS Platform as a Service SQL Server Analysis Services Analysis Services Power BI
Compute: Big Data IaaS Infrastructure as a Service PaaS Platform as a Service SaaS Software as a Service HDInsight in a VM HDInsight Spark Cluster Databricks Data Lake Analytics HDInsight HDInsight Interactive Hadoop Cluster Query Cluster (Hive LLAP)
Compute: Streaming & Event Processing PaaS Platform as a Service HDInsight Kafka Cluster HDInsight Storm Cluster HDInsight Spark Streaming Cluster IoT Hub Event Hub Stream Analytics
Compute: Data Integration IaaS Infrastructure as a Service PaaS Platform as a Service Serverless Tool of your choice Data in a virtual machine Factory Databricks Functions HDInsight: Spark, Hive, Pig, Scoop, Oozie Automation
Reference Architectures Following are examples only! There are many variations & opportunities to exchange one service for another.
Small/Medium Data Warehousing Source Data Multi-Structured Data Blob Storage Reporting & Analysis Tools DW: Structured Data Semantic Layer SQL Database Analysis Services Power BI Excel
Enterprise Data Warehousing and BI Multi-Structured Data Data Lake Storage Data Mart(s) DW: Structured Data SQL Data Warehouse SQL Database Semantic Layer Power BI Analysis Services Excel
Data Science and Artificial Intelligence Multi-Structured Data Data Science and AI Data Lake Storage Databricks Machine Learning HDInsight Cognitive Services DW: Structured Data SQL Data Warehouse Data Mart(s) Power BI SQL Database Excel
Unified Data Science & Data Engineering Data Lake: Multi-Structured Data Data Lake Storage Scheduled Notebook Job Structured Data SQL Database Raw Data Curated Data Data Science Sandbox Operationalized Analytics Exploratory Analytics Databricks
Big Data Interactive Querying (SQL on Hadoop) Data Lake Data Lake Storage Hive Metastore SQL Database Hive Data Warehouse HDInsight Interactive Query Cluster (Hive LLAP) HiveQL
Big Data Batch Processing Data Lake: Multi-Structured Data Data Lake Storage Big Data Job Processing Data Lake Analytics U-SQL Job Processing Job 1 Job 2 U-SQL Extensions Python ADLA Catalog Database SQL Server in VM SQL DB SQL DW Tables Views Schemas Procedures Functions Assemblies External Data Sources Cognitive Services
IoT + Batch Data (Lambda Architecture) Speed Layer Streaming Dashboard Serving Layer Event Hub Stream Analytics Power BI Batch Layer Data Lake Storage SQL Data Warehouse Analysis Services Power BI Excel
Operational BI (Embedded BI) Published Reports Power BI Service Embedded Visuals Custom Application Source Data SQL Database Data Model + Reports Power BI Desktop Premium Capacity App Workspace REST API calls
Web Application Web Page SQL Database Web App Cache Diagnostics Backups App Service Plan Storage Account Storage Account
Wrap-Up
More Info Solution Architectures: https://azure.microsoft.com/en-us/solutions/architecture/
More Info Data Architecture Guide: https://docs.microsoft.com/en-us/azure/architecture/data-guide/
Thanks! Download latest version of slides: SQLChick.com > Presentations & Downloads page Creative Commons License 3.0 Attribute to me as original author if you share this material No usage of this material for commercial purposes No derivatives or changes to this material