From the Source to the Dashboard: SAP Agile Data Warehousing for Self-Service BI Michael D Rutland, Sr SE, SAP / @TDWI, 9 October 2017, Savannah
Disclaimer The information in this presentation is confidential and proprietary to SAP and may not be disclosed without the permission of SAP. Except for your obligation to protect confidential information, this presentation is not subject to your license agreement or any other service or subscription agreement with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or any related document, or to develop or release any functionality mentioned therein. This presentation, or any related document and SAP's strategy and possible future developments, products and or platforms directions and functionality are all subject to change and may be changed by SAP at any time for any reason without notice. The information in this presentation is not a commitment, promise or legal obligation to deliver any material, code or functionality. This presentation is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. This presentation is for informational purposes and may not be incorporated into a contract. SAP assumes no responsibility for errors or omissions in this presentation, except if such damages were caused by SAP s intentional or gross negligence. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions. 2
Market Expectations Gartner 1 Emerging data sources, trends and technologies challenge the effectiveness of data warehouses in supporting analysis and decision making. IDC 2 : The data warehousing market based on relational databases will continue to be disrupted by several nonrelational and/or nonschematic information management software categories. Data warehouses will not disappear as they have a key place in an organization's data architecture. * 1 2016 Strategic Roadmap for Modernizing Your Data Warehouse Initiatives Mark Beyer and Lakshmi Randall, Gartner, October 2016 * 2 Worldwide Business Analytics Software Forecast, 2016 2019 by Dan Vesset et al, IDC, July 2016. Doc # 257402 3
SAP HANA Platform The data management and application platform for all applications APPLICATION DEVELOPMENT Web Server </> JavaScript Spatial All Devices SAP, ISV and Custom Applications S A P H A N A P L A T F O R M ADVANCED ANALYTICAL PROCESSING Graph Predictive Search DATA INTEGRATION & QUALITY Data Virtualization ELT & Replication ALM Fiori UX Graphic Modeler Application Lifecycle Management Text Analytics Streaming Analytics Series Data Business Functions Data Quality Hadoop & Spark Integration Remote Data Sync DATABASE MANAGEMENT Columnar OLTP+OLAP Multi-Core & Parallelization Advanced Compression Multi-tenancy Multi-Tier Storage Data Modeling Openness Admin & Security High Availability & Disaster Recovery 4
SAP HANA Platform: How does SAP approach Data Warehousing Two ways to run, or get the best of both Application driven approach, SAP BW/4 HANA as premium DW application with integrated services SAP BW/4HANA SAP BW/4HANA is an application offering. All data warehousing services via one integrated repository SCHEDULING & MONITORING OLAP MODELING LIFECYCLE MANAGEMENT PLANNING ETL Optional integration of additional tools for modelling, monitoring and managing the data warehouse SAP HANA Platform SQL driven approach, SAP HANA with loosely coupled tools and platform services, logically combined SQL approaches require several loosely coupled tools, usually having separate repositories Best of breed approach to build your own model SCHEDULING & MONITORING OLAP HANA SQL DW MODELING LIFECYCLE MANAGEMENT PLANNING ETL SAP HANA Platform 5
Why should you choose HANA SQL DW? Strengths Complete web approach with HANA XS Advanced platform. Still 100% open SQL approach. Strong and open repository versioning with Git Freedom to custom built data models and data management processes. Example: adopt Data Vault model. Leverage 3 rd party tools and in-house standards, skills & knowledge SCHEDULING & MONITORING HANA SQL DW MODELING PLANNING DevOps enabler: Continuous Testing Integration Deployment OLAP LIFECYCLE MANAGEMENT ETL Use Case SAP HANA Platform Considerable share of non-sap source systems and interfacing Specific data model requirements, for example for for auditability 3 rd party DW replacement DevOps requirements Public cloud deployment (SQL DW not fully available yet) 6
Introducing the HANA SQL DW application toolset Design Develop Deploy Run 7
HANA SQL Data Warehouse Data process perspective of SAP defined SQL DW Consume BusinessObjects BI Predictive Planning 3 rd -PartyAnalytics Data Lake SAP Vora SAP HANA WebIDE Model, Compute & Data Store SAP PowerDesigner / SAP Enterprise Architecture Designer Git-Hub Ingest Sources ETL Replication Streaming Virtual Access Sensor Machine 8
HANA SQL Data Warehouse What are the components that define a DW SAP HANA EIM (SDI/SDQ) + Agile Data: Access, integrate, cleanse, match, and enhance data SAP Enterprise Architecture Designer / PowerDesigner / SAP Web IDE: Model data across the enterprise Data Integration Before After Data Quality Management HANA Platform Services Data Lineage / Impact Analysis SAP HANA EIM Assess, monitor quality, metadata management, track business impact Agile Data Preparation SAP Enterprise Architecture Designer / SAP Web IDE: Identify impacts and implement changes SAP HANA Data Warehousing Foundation (HANA DWF): Manage and schedule the data processing and lifecycle of information Enterprise Modeling Information Lifecycle Management SAP HANA Application Lifecycle Management Model, assemble, configure, version & deploy products / releases 9
Introducing the HANA SQL DW application toolset Modeling your processes and data Design Develop Deploy Run SAP Power Designer SAP Enterprise Architecture Designer 10
SAP Enterprise Architecture Designer Edition for SAP HANA Create and integrate enterprise, landscape, process, and data models to manage information and systems effectively Business process architecture Strategy Landscape and application architecture Requirements management Business Technology Design Strategy architecture to document goals and projects Physical data modeling & data architecture Reverse engineering capabilities Lineage & Impact analysis Implementation Process Data Landscape Requirements 11
12
Enterprise Architecture Designer Specifics for SAP HANA Reverse-Engineering capabilities Impact Analysis, Model Comparison Supports HANA HDI Capabilities to generate Tables & Views Data Movement Models (Flowgraphs) Native DataStore Objects Virtual table definitions HANA CDS Associations Offers Git integration 13
Building the SQL DW One environment to build all artefacts Design Develop Deploy Run SAP Web IDE for HANA Develop the entire DW model from your browser Major extensions for DW functions (Flowgraphs, NDSO, DLM, Taskchains) 14
SAP Web IDE for SAP HANA SAP Web IDE for SAP HANA is the successor to SAP HANA web development workbench and the development perspectives of SAP HANA studio. It offers Development of SAP HANA content and models UI development with SAPUI5 Development of polyglot applications Node.js, Java or XSJS business code Git integration It is Browser based Installed as SAP HANA XSA application 15
SAP Web IDE Calculation Views & Flowgraphs 16
SAP Web IDE Native DataStore Objects & Taskchains 17
SAP Data Warehousing Foundation - NDSO Simplification of the Data Warehouse Classic DWH best practice for request management and delta handling To be able to enable delta propagation, or roll-back of data loads, Request or Batch management is needed Metadata on data loads needs to be stored in the target table load to (e.g. a batch ID), and a metadata framework is developed to record load date/time, execution user, number of records loaded To allow for roll-back, additional table is needed to record all changes (before/after image), or all data changes need to be time-sliced in target table Setting this up and keeping it running can take considerable effort, for example for design of metadata tables, roll-back database procedures, and monitoring functions. Running these processes can be resource intensive and increase DWH load times Native DataStore Object The NDSO provides request management and delta handling out of the box The NDSO is delivered with a friendly user interface for load monitoring and request handling features such as roll-back The NDSO can be defined in a textual & graphical way by leveraging HANA CDS capabilities (associations) The NDSO integrates natively with EIM flowgraphs, and with 3rd party ETL The NDSO supports the delta language of SAP data source extractors DB DB DB procedu re Metadata tables Batch ID User Date Time RunTime Batch 5 Jan 17 Batch 4 Jan 16 Batch 3 Jan 15 Batch 2 Jan 14 Batch 1 Jan 13 NDSO Metadata tables Batch ID Date Time User RunTi me Batch 5 Jan 17 Batch 4 Jan 16 Batch 3 Jan 15 Batch 2 Jan 14 Batch 1 Jan 13 Design and development effort Out of the box 18
SAP Data Warehousing Foundation - NDSO Embedded in HANA Web IDE - Fundamentals Native DataStoreObject Provide a central persistence object with additional semantics to determine deltas Move, aggregation and delta loads containing deleted records Provide interoperability between native Data Warehouses and BW/4HANA Embedded into HANA Web IDE using HANA CDS as metadata description language Embedded into HANA flowgraph 19
Integrated Data Warehouse Processes Design Develop Deploy Run Data Warehousing Foundation Data Warehousing Scheduler Data Lifecycle Manager Data Warehousing Monitor 20
SAP HANA Data Warehousing Foundation - DLM Data Lifecycle Manager TBs - 10s of TBs 10s of TBs - PBs SQL Data Warehousing DLM Generated Union & Pruning CalcViews DLM managed data placement Based on aging rules DLM Data Lake (Cold Store) SAP Vora HADOOP In-Memory (Hot Store) Dynamic Tiering (Warm Store) SAP IQ Structured data for fast analytics Less frequently accessed, structured data Raw data: semi-structured, unstructured, streaming data etc. 21
SAP Data Warehousing Foundation - DLM Embedded in HANA Web IDE Common approach Outlook HANA DWF 2 SP2 (Sept 2017) Data Lifecycle Manager (DLM) Offer data warehouse developers functionality to define displacement strategies for aged data in HANA to Spark, Vora, Sybase IQ, Dynamic Tiering or HANA Extension Enable access to warm and cold data by generating pruning views (calculation views) Enables data displacement by generating HANA db procedures Embedded into HANA Data Warehousing Scheduler through generation of DLM task chains 22
SAP Data Warehousing Foundation - DWS Embedded in HANA Web IDE Common approach Data Warehousing Scheduler (DWS) provide a framework to define task chains as a sequences of single tasks Flexible start conditions Parallelization and Dependency Handling Provide capability to schedule flowgraphs, NDSO related tasks, project local db procedures (planned for DWF 2 SP02) and DLM related tasks (planned for DWF 2 SP02) 23
Deploying the HANA SQL DW models Design Develop Deploy Run CTS+ XSA integrates with enhanced change and transport system (CTS+) SAP Application Lifecycle Manager SAP HANA Product Installer Open Source deployment Bring your own tools: Jenkins, XL release, etc. 24
Classic DWH development All developers work in the same workspace and runtime, on the same version 25
Versioning, branching and development with GIT Working in parallel on different repository versions User story 1 User story 2 Master Time 26
Deployment example Continuous WebIDE Continuous Testing Integration Deployment Continuous Integration (CI) Server Assemble & Deploy Daily Builds SIT/UAT Prod Deploy Deploy Deploy Regression Test++ Production 27
Agile Software Development in a typical Data Warehousing Scenario 28
Summary