Data Platform Futures Jon Jahren Data & AI Architect Microsoft jon.jahren@microsoft.com
2017 Microsoft. All rights reserved. The following content contains forward looking statements including ongoing Microsoft research and may include non-committed research and product engineering activities that Microsoft may not deliver as future operations, future products or new features for existing products. This information is subject to change at any time without prior notification. Statements contained in this document concerning these matters only reflect Microsoft's expectations as of the date of this document. Changes in product strategy resulting from technological, internal corporate, market and other changes may occur. This is not a commitment to deliver any material, code or functionality and should not be relied upon in making purchasing decisions.
SQL v.next & Scale-out Azure SQL Futures in IoT and Big Data Engineering What s bigger than Big Data Informational Graphs, bots and Reinforced Learning Machine Reading and Natural Language Computing Homomorphic Security doing analytics on someone else s private data Collaborative Clouds and multi-party data vaults Quantum Machine Learning Quantum Networks and Quantum Database Computing
Vulnerabilities (2010-2016) Self-service BI per user S Q L S E R V E R 2 0 1 7 1/10th the cost of Oracle I N D U S T R Y - L E A D I N G P E R F O R M A N C E A N D S E C U R I T Y N O W O N L I N U X A N D D O C K E R Choice of platform and language Industry-leading performance Most secure over the last 7 years Only commercial DB with AI built-in End-to-end mobile BI on any device 200 180 $2,230 160 1/10 140 120 100 R T-SQL Java C/C++ C#/VB.NET PHP Node.js Python Ruby #1 TPC-H performance 1TB, 10TB, 30TB #1 TPC-E performance 80 60 40 20 0 1 2 3 4 5 6 R and Python + in-memory at massive scale $480 $120 Microsoft Tableau Oracle #1 price/performance Native T-SQL scoring A fraction of the cost In-memory across all workloads Private cloud Most consistent data platform Public cloud
Query times S Q L S E R V E R 2 0 1 7 Key New Functionality 1010 0101 0110 Plan 1 Plan 2 Plan 3 Plan 2 Revert to previously effective plan Statistics Skill Andy Smith Degree earned Position B.S. Science, Finance Business Analyst R R and Python + in-memory at massive scale Native T-SQL scoring
S Q L S E R V E R v N E X T I N V E S T M E N T T H E M E S Reason over any data, anywhere Choice of language and platform Industry leading performance and security Only commercial database with AI built-in Continued improvements to SQL Server on Linux
It s not practical to think in terms of a Roadmap anymore
Event horizon Now I recommend we focus on today Dynamic Networks Big Data Mobility Knowledge Exploration (Descriptive, Diagnostic, Predictive, Prescriptive) Patterns & Insight (Machine Learning) Virtual / Augmented Reality Security & Cyber-forensics Next years Homomorphic Security What's bigger than Big Data Algorithms / Artificial Intelligence Natural Language Computing Informational Graphs Context and Learning Relationship Graphs Holo-portation (3D See, Hear, Interact) (link) Dialogue Systems Field Programmable Gate Arrays (Algorithm Hardware) Off Grid Computing Underwater Datacenters A few more years Quantum Computing (link) Cryogenic Temperature Memory (link) Materials that can store Data with light (link) Storage of information on DNA strands (link) all of the data on the net could fit in a shoe box Cryptography, Security, Applied Math Defined Researched
The Data Flow OnPrem Data Store Azure Functions Event Hubs Other Data Cold Path Analytics Azure HDInsight, AzureML Azure Data Lake, Data Lake Analytics Power BI Hot Path Analytics Azure Stream Analytics App Service Web Apps Mobile Apps OPC-UA Client Azure HDInsight Storm Hot Path Business Logic OPC-UA Proxy Service Fabric & Actor Framework Logic Apps Notification Hubs BizTalk Services Azure IoT Edge Gateway Devices and Data Sources Cloud Gateway Warm Path Analytics IoT Hub Time Series Insights Data Ingestion & Processing, Command and Control Presentation & Business Connectivity
Solution scenarios Big Data & Advanced Analytics Modern Data Warehousing Advanced Analytics Real-time Analytics We want to integrate all our data including big data with our data warehouse We are trying to predict when our customers churn. We are trying to get insights from our devices in real-time, etc.
D A T A W A R E H O U S I N G P A T T E R N I N A Z U R E Loading and preparing data for analysis with a data warehouse DATA FACTORY Azure Import/Export Service Azure Data Box API s, CLI & GUI Tools APPLICATIONS r LOGS, FILES AND MEDIA (UNSTRUCTURED) DATA LAKE STORE AZURE STORAGE AZURE DATABRICKS COSMOS DB AZURE SQL DW HDINSIGHT DATA LAKE ANALYTICS BUSINESS / CUSTOM APPS (STRUCTURED) AAS DASHBOARDS COSMOS DB SQL DB
A D V A N C E D A N A L Y T I C S P A T T E R N I N A Z U R E Performing data collection/understanding, modeling and deployment SENSORS AND IOT (UNSTRUCTURED) AZURE ML AZURE ML STUDIO ML SERVER AZURE DATABRICKS SQL Server (Spark ML) (In-database ML) DATA SCIENCE VM BATCH AI COSMOS DB APPLICATIONS SQL DB r LOGS, FILES AND MEDIA (UNSTRUCTURED) DATA LAKE STORE AZURE STORAGE COSMOS DB SQL DB DATA LAKE ANALYTICS AZURE DATABRICKS HDINSIGHT SQL DW BUSINESS / CUSTOM APPS (STRUCTURED) DATA FACTORY AZURE CONTAINER SERVICE SQL Server (In-database ML) AZURE ANALYSIS SERVICES DASHBOARDS
B I G D A T A S T R E A M I N G P A T T E R N W I T H A Z U R E SENSORS AND IOT (UNSTRUCTURED) AZURE ML STUDIO R SERVER AZURE DATABRICKS (Spark ML) REAL-TIME APPLICATIONS r LOGS, FILES AND MEDIA (UNSTRUCTURED) EVENT HUBS IoT HUB KAFKA on HDINSIGHT STREAM ANALYTICS AZURE DATABRICKS (Spark Streaming) STORM on HDINSIGHT BUSINESS / CUSTOM APPS (STRUCTURED) REAL-TIME DASHBOARDS
There is a natural balance in IoT between the cloud and the edge SOLUTIONS THINGS Build Connect Manage INSIGHTS ACTIONS
IoT Pattern + Edge Azure IoT Hub Things Insights Actions Cloud Gateway Insights Actions SOLUTIONS THINGS Build Connect Manage INSIGHTS ACTIONS
IoT scale time-series data store Schema-less store, just send data Easy IoT Hub connection Store, query and visualize billions of events Simple and fast navigation SOLUTIONS THINGS INSIGHTS Discover Operationalize Refine ACTIONS
MULTI-PARTY DATA TRUSTED DATA COLLABORATIVE ACCESSIBLE DATA SECURE ANALYTICS VERIFIED COMPLIANCE TECHNOLOGY BUSINESS MODEL INNOVATIONS PROCESSING TRUST ENCRYPTED FRAMEWORKS DATA TAMPER DATA RESISTANT TRUSTEES AUDIT LOGS POLICY BASED SMART KEY RECOVERY CONTRACTS POLICY DATA ENCUMBERED MARKETS DATASETS DATA PROVENANCE MULTI-PARTY DIGITAL DATA CHAIN VAULTS OF CUSTODY
UN / World Bank A platform for National Statistics and Sustainable Development for established and developing nations City of Seattle Urban mobility experience and design City of Bellevue pedestrian safety through video analytics Financial Fabric data-sharing and analytics between hedge funds on pension funds to manage systemic risk Answer ALS Data sharing platform for the largest ALS research collaborative in the world UC Davis Statewide waterenergy conservation City of Bellevue NIST End to end water insight and emergency response San Diego County Courts juvenile recidivism Industry-focused Geographic Scope Multi-party protected data sets Customer-connected Innovation engagements
DATA MANAGEMENT GATEWAY MANUAL CONFIG OF RBAC-BASED DATA ACCESS MANUAL PROCESS OF PUBLISHING SURVEY DATA ADMIN- ISTRATIVE MICRO DATA OPEN DATA LANDSAT
Deep neural networks have enabled major advances in machine learning and AI Computer vision Language translation Speech recognition Question answering And more Problem: DNNs are challenging to serve and deploy in large-scale online services Recurrent Neural Networks y t-1 y t y t+1 h t-1 h t h t+1 h t-1 h t h t+1 x t-1 x t x t+1 Convolutional Neural Networks 22
DNN Processing Units Registers Contro l Unit (CU) CPUs Arithmeti c Logic Unit (ALU) GPUs Soft DPU (FPGA) Hard DPU ASICs FLEXIBILITY EFFICIENCY BrainWave Baidu SDA Deephi Tech ESE Teradeep Etc. Cerebras Google TPU Graphcore Groq Intel Nervana Movidius Wave Computing Etc. 23
Performance Excellent inference performance at low batch sizes Ultra-low latency serving on modern DNNs >10X lower than CPUs and GPUs Scale to many FPGAs in single DNN service Flexibility FPGAs ideal for adapting to rapidly evolving ML CNNs, LSTMs, MLPs, reinforcement learning, feature extraction, decision trees, etc. Inference-optimized numerical precision Exploit sparsity, deep compression for larger, faster models Scale Microsoft has the world s largest cloud investment in FPGAs Multiple Exa-Ops of aggregate AI capacity BrainWave runs on Microsoft s scale infrastructure 24
RSA-2048 Challenge Problem 251959084756578934940271832400483985714292821262040 320277771378360436620207075955562640185258807844069 1829064124951508218929855914917618450280848912007284 4992687392807287776735971418347270261896375014971824 Classical Quantum 6911650776133798590957000973304597488084284017974291 1 00642458691817195118746121515172654632282216869987549 billion 182422433637259085141865462043576798423387184774447 9207399342365848238242811981638150106748104516603773 years 0605620161967625613384414360383390441495263443219011 4657544454178424020924616515723350778707749817125772 467962926386356373289912154831438167899885040445364 023527381951378636564391212010397122822120720357 100 seconds
01 000 001 010 011 100 101 110 111
000 001 010 011 100 101 110 111 Quantum F(x) Processor F(000) F(001) F(010) F(011) F(100) F(110) F(111)
Nitrogen fixation 100-200 100-200 100s-1000s 100s-1000s