Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft
About.me @two_under Senior Program Manager 9 years at Microsoft Visual Studio Office Windows Server Analytics Platform System (APS) Amazon, Deloitte Consulting, 5 startups
data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system in IT is changing. - Gartner, The State of Data Warehousing in 2012 3
The traditional data warehouse Data sources Non-relational data 4
The modern data warehouse Data sources Non-relational data
Insights from all your data Enrich and optimize your data from non-traditional sources 6
Microsoft Analytics Platform System The turnkey modern data warehouse appliance Relational and non-relational data in a single appliance Enterprise-ready Hadoop Integrated querying across Hadoop and PDW using T-SQL Direct integration with Microsoft BI tools such as Microsoft Excel Near real-time performance with In-Memory Columnstore Ability to scale out to accommodate growing data Removal of data warehouse bottlenecks with MPP SQL Server Concurrency that fuels rapid adoption Industry s lowest data warehouse appliance price per terabyte Value through a single appliance solution Value with flexible hardware options using commodity hardware
Hardware and software engineered together The ease of an appliance Analytics Platform System Pre-built hardware + software appliance Co-engineered with Dell, HP, and Quanta SQL Server Parallel Data Warehouse Pre-built hardware Pre-installed software Plug and play Built-in best practices PolyBase Microsoft HDInsight Time savings Built for Big Data
Microsoft Analytics Platform System The turnkey modern data warehouse appliance
Hadoop alone is not the answer to all Big Data challenges Steep learning curve, slow and inefficient Hadoop ecosystem Move HDFS into the warehouse before analysis Learn new skills New data sources T-SQL New New data data sources sources Build Integrate Manage Maintain Support ETL
APS delivers enterprise-ready Hadoop w/ HDInsight Manageable, secured, and highly available Hadoop integrated into the appliance SQL Server Parallel Data Warehouse High performance and tuned within the appliance End-user authentication with Active Directory PolyBase Microsoft HDInsight 100-percent Apache Hadoop Managed and monitored using System Center Accessible insights for everyone with Microsoft BI tools
Connecting islands of data with PolyBase Bringing Hadoop point solutions and the data warehouse together for users and IT Select Result set Microsoft Azure HDInsight Hortonworks for Windows and Linux Cloudera SQL Server Parallel Data Warehouse PolyBase Microsoft HDInsight Provides a single T-SQL query model for PDW and Hadoop with rich features of T-SQL, including joins without ETL Uses the power of MPP to enhance query execution performance Supports Windows Azure HDInsight to enable new hybrid cloud scenarios Provides the ability to query non-microsoft Hadoop distributions, such as Hortonworks and Cloudera
PolyBase simplifies using Hadoop data Bringing islands of Hadoop data together Running high performance queries against Hadoop data Archiving data warehouse data to Hadoop (move) Exporting relational data to Hadoop (copy) Importing Hadoop data into a data warehouse (copy)
Automatic MapReduce pushdown Source systems Analytics / Ad-hoc / Visualization SQL Server Data Marts Hadoop / Data Lake (Cloudera, Hortonworks, HDInsight) MapReduce SQL Server Parallel Data Warehouse PolyBase T-SQL SQL Server Reporting Services Microsoft HDInsight Day / Hour / Minute Refresh APS SQL Server Analysis Services
PolyBase Predicate pushdown Dynamic binding HDFS File / Directory //hdfs/social_media/twitter //hdfs/social_media/twitter/daily.log Column filtering User Location Product Sentiment Rtwt Hour Date SELECT User, Product, Sentiment Sean CA xbox -1 5 2 5-25-14 FROM Twitter_Table Audie Suz CO WA excel xbox 1 0 0 0 2 2 5-25-14 5-25-14 WHERE Hour = Current - 1 AND Date = Today AND Sentiment > 0; Tom Sanjay IL MN sqls wp8 1 1 8 0 2 1 5-25-14 5-25-14 Roger TX ssas 1 0 23 5-25-14 Row filtering Steve AL ssrs 1 Hadoop 0 23 5-24-14
19 Demo
Microsoft Analytics Platform System The turnkey modern data warehouse appliance
Performance and Scale Limitations in Traditional Data Warehouses Scale up Rowstore Forklift Forklift Data C1 C2 C3 C4 R1 R1 R1 R1 R2 R2 R2 R2 R3 R3 R3 R3 R4 R4 R4 R4 R5 R5 R5 R5 R6 R6 R6 R6 Querying data by row Page 1 Page 2 Page 3 Diminishing scale as requirements grow Sub-optimal performance for many data warehouse queries
Scaling out your data to petabytes Scale-out technologies in the Analytics Platform System Scale out Multiple nodes with dedicated CPU, memory, and storage PDW / HDInsight PDW / HDInsight PDW / HDInsight PDW / HDInsight PDW / HDInsight PDW / HDInsight Ability to incrementally add hardware for near-linear scale to multiple petabytes Ability to handle query complexity and concurrency at scale PDW No forklift of prior warehouse to increase capacity 0 terabytes 6 petabytes Ability to scale out HDInsight and PDW 22
Clustered Columnstore Index Why is a clustered columnstore index important? Saves space Provides easier management by eliminating maintenance of secondary indexes Supports all PDW data types, including highprecision decimal data types and more 20.0 15.0 10.0 5.0 Space used in GB (table with 101 million rows) 91% savings In-Memory Columnstore is featured in the storage engine in PDW AU1 0.0 1 2 3 4 5 6 Space used = table space + index space
Concurrency that fuels rapid adoption Great performance with mixed workloads Analytics Platform System ETL/ELT with SSIS, DQS, MDS Intra-Day CRTAS SQL Server SMP ERP CRM LOB APPS ETL/ELT with DWLoader Near real-time PDW Link Table Real-Time Reporting and cubes Columnstore ROLAP / MOLAP DirectQuery Hadoop / Big Data Polybase PolyBase SNAC BI Tools Ad hoc queries Fast ad hoc HDInsight
Blazing-fast performance MPP and In-Memory Columnstore for next-generation performance Columnstore index representation Up to 100x faster queries Up to 15x more compression Updateable clustered columnstore vs. table with customary indexing 25 Parallel query execution Query Results Store data in columnar format for massive compression Load data into or out of memory for nextgeneration performance with up to 60% improvement in data loading speed Updateable and clustered for real-time trickle loading
Microsoft Analytics Platform System The turnkey modern data warehouse appliance
Thousands Lowest $/TB for data warehouse appliance High performance using commodity hardware Price per terabyte for leading vendors $30 $25 $20 Price per terabyte for user-available storage (compressed) Significantly lower price per terabyte than the closest competitor $15 $10 $5 $0 Oracle EMC IBM Teradata Microsoft NOTE: Orange line indicates average price per terabyte. Lower storage costs with Windows Server 2012 Storage Spaces
29 Demo
www.microsoft.com/aps www.microsoft.com/bigdata
Questions?
Thank You for Attending