ETL is No Longer King, Long Live SDD

Similar documents
Comparison of SmartData Fabric with Cloudera and Hortonworks Revision 2.1

SmartData Fabric (SDF) aka SmartData Lake (SDL) aka Distributed Data Virtualization (DDV) Basic Overview

Distributed Hybrid MDM, aka Virtual MDM Optional Add-on, for WhamTech SmartData Fabric

SmartData Fabric distributed virtual data, graph data and master data management, analytics and security. Solutions and Key Features Revision 2.

Building a Data Strategy for a Digital World

The Emerging Data Lake IT Strategy

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC

Modern Data Warehouse The New Approach to Azure BI

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

The Value of Data Governance for the Data-Driven Enterprise

ADABAS & NATURAL 2050+

WEBMETHODS AGILITY FOR THE DIGITAL ENTERPRISE WEBMETHODS. What you can expect from webmethods

Copyright 2016 Datalynx Pty Ltd. All rights reserved. Datalynx Enterprise Data Management Solution Catalogue

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems

Transforming IT: From Silos To Services

Title DC Automation: It s a MARVEL!

REDEFINING THE ENTERPRISE

Data Analytics at Logitech Snowflake + Tableau = #Winning

The Business Value of Metadata for Data Governance: The Challenge of Integrating Packaged Applications

Tamr Technical Whitepaper

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

WHITE PAPER: TOP 10 CAPABILITIES TO LOOK FOR IN A DATA CATALOG

Optimized Data Integration for the MSO Market

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.

Dell Boomi Cloud MDM Overview

What is Gluent? The Gluent Data Platform

The Value of Data Modeling for the Data-Driven Enterprise

Lambda Architecture for Batch and Stream Processing. October 2018

Fast Innovation requires Fast IT

Data Management Glossary

Government IT Modernization and the Adoption of Hybrid Cloud

Optimizing and Modeling SAP Business Analytics for SAP HANA. Iver van de Zand, Business Analytics

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud

Energy Management with AWS

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Delivering a 360 o View in Healthcare and Life Sciences With Agile Data

The NIH Collaboratory Distributed Research Network: A Privacy Protecting Method for Sharing Research Data Sets

CloudSwyft Learning-as-a-Service Course Catalog 2018 (Individual LaaS Course Catalog List)

Overview of Data Services and Streaming Data Solution with Azure

Realizing the Full Potential of MDM 1

API, DEVOPS & MICROSERVICES

BRINGING DATA LINEAGE TO YOUR FINGERTIPS

Modernization and how to implement Digital Transformation. Jarmo Nieminen Sales Engineer, Principal

Configuration for Registry*, Repository or Hybrid** Master Data Management. *aka Federated or Virtual **aka Coexistence

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Big Data with Hadoop Ecosystem

MICROSOFT CLOUD PLATFORM AND INFRASTRUCTURE CERTIFICATION. Includes certifications for Microsoft Azure and Windows Server

PERSPECTIVE. Data Virtualization A Potential Antidote for Big Data Growing Pains. Abstract

IBM InfoSphere Master Data Management Version 11 Release 5. Overview IBM SC

Understanding the latent value in all content

@Pentaho #BigDataWebSeries

Modernizing Healthcare IT for the Data-driven Cognitive Era Storage and Software-Defined Infrastructure

Building the Digital Media Workstream: From project pitch to program purchase

Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization

Virtuoso Infotech Pvt. Ltd.

Informatica Enterprise Information Catalog

Red Hat JBoss Middleware Integration Products Roadmap. Ken Johnson Director, Product Management, Red Hat

2-4 April 2019 Taets Art and Event Park, Amsterdam CLICK TO KNOW MORE

Oliver Engels & Tillmann Eitelberg. Big Data! Big Quality?

Solving the Enterprise Data Dilemma

Take P, R or U. and solve your data quality problems Oliver Engels & Tillmann Eitelberg, OH22

Digital Enterprise Platform for Live Business. Kevin Liu SAP Greater China, Vice President General Manager of Big Data and Platform BU

Informatica Data Quality Product Family

SOLUTION TRACK Finding the Needle in a Big Data Innovator & Problem Solver Cloudera

Data Governance Industrial Internet & Big Data

Microsoft certified solutions associate

Creating a Recommender System. An Elasticsearch & Apache Spark approach

Good analytics needs good data and that needs good metadata

Selling Improved Testing

FEATURES BENEFITS SUPPORTED PLATFORMS. Reduce costs associated with testing data projects. Expedite time to market

USERS CONFERENCE Copyright 2016 OSIsoft, LLC

Hybrid Data Platform

Transform to Your Cloud

& Cross-Channel Customer Engagement RFP Guide

How Insurers are Realising the Promise of Big Data

Metadata and the Rise of Big Data Governance: Active Open Source Initiatives. October 23, 2018

Applying Auto-Data Classification Techniques for Large Data Sets

Powering Transformation With Cisco

Big data easily, efficiently, affordably. UniConnect 2.1

Top 7 Data API Headaches (and How to Handle Them) Jeff Reser Data Connectivity & Integration Progress Software

Renovating your storage infrastructure for Cloud era

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP

IBM Software IBM InfoSphere Information Server for Data Quality

Please give me your feedback

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

ODPi and Data Governance Free Your MetaData! October 10, 2018

Advanced Solutions of Microsoft SharePoint Server 2013 Course Contact Hours

Active Archive and the State of the Industry

CHARTING THE FUTURE OF SOFTWARE DEFINED NETWORKING

Sentara s Digital Vision for the Future: A DevSecOps Story

Advanced Solutions of Microsoft SharePoint 2013

Smarts Application Discovery Manager 5.0: Accelerating Server/Data Center Consolidations, Application Migrations, and CMDB Projects

Data Vault Brisbane User Group

Advanced Solutions of Microsoft SharePoint Server 2013

Be a VDI hero with Nutanix

EMEA USERS CONFERENCE BERLIN, GERMANY. Copyright 2016 OSIsoft, LLC

Transcription:

ETL is No Longer King, Long Live SDD How to Close the Loop from Discovery to Information () to Insights (Analytics) to Outcomes (Business Processes) A presentation by Brian McCalley of DXC Technology, Glenn Field of SiriusIQ and Gavin Robertson of WhamTech, Inc.

No secret that most organizations face major datarelated hurdles Location System Access Security Container Format Age Dirty Typo/Transposition Missing Meaning Duplication Obfuscation Governance

and ANALYTICS is the prime driver to lower costs and increase revenue which, in turn, drives the need for applications* to have clean and understood data in specific formats *Reporting, BI, analytics, CDI-MDM, CRM, SCM, fraud detection, antimoney laundering, ERP, etc.

Goals for an optimal data architecture 1. Complete, clean, transformed, standardized and secure data, and master data, for multiple applications 2. Near real-time minimal update and query latency 3. Automation, including workflow and event processing 4. Support reporting, BI and analytics, including graph database 5. Minimize copies of data 6. discovery, metadata repository and data governance 7. Write back to data sources

Managing data in, or from, multiple disparate systems requires a new approach CONVENTIONAL APPROACHES ETL: Copy and transform schemas and data to a one-size-fits-all data warehouse Copy: To a single Lake/Big repository Federate: Submit queries through adapters to source systems Search: E.g., Solr /Elasticsearch - copy, read, parse and index data, process queries to provide data Big options for storing data CULTURAL HURDLES TECHNICAL HURDLES SECURITY & PRIVACY HURDLES The new approach leverages the advantages of each conventional approach

Typical Warehouse/Big / Lake/Search Application(s) Source Source Source Extract Extract Extract and Schema Transform and Schema Transform and Schema Transform Load Load Load Warehouse/ Big / Lake/ Search Queries resolved in the Warehouse/Big / Lake/ Search Expensive in terms of time and cost to implement and maintain

Typical federated data access with conventional adapters Source Connector Adapter Source Connector Adapter Middleware Application(s) Source Connector Adapter Queries resolved mainly in the data source Expensive to implement and limits capabilities

ETL has been the only option to come close to meeting the goals for an optimal data architecture, until now Introducing Software Defined (SDD) consisting of unconventional federated adapters that Read, Transform (process) and Index (RTI) source data and process queries against these indexes

SDD initial discovery, index and adapter configuration, index build and Standard View (SDV) mapping Discovery and raw indexbased Profiling Metadata Discovery and Semantic Mapping Classification and Security Develop and test Transforms using profiles Distributed Metadata Repository, incl. Governance Source Discovery Network Asset and Device Discovery Source Read, Transform/ clean-up (and Index) Indexes SDD Adapter w/sdv Twelve ways to build and maintain indexes Index schema and names same as data source Indexes do not store data only queryable representations Indexes mapped to Standard View (SDV)

SDD index update, query processing and results retrieval Distributed Metadata Repository, incl. Governance Queries resolved in the adapter and indexes Result-set pointers to data in source Source Read, Transform/ clean-up (and Index) User-level access Continuous EIQ Indexes updates Indexes SDD Adapter w/sdv Raw results data transformed/cleaned-up from source SDD Federation Server (submiddleware) w/sdv Applications/middleware connect with standard drivers, APIs, Web/data services and SQL Middleware Application(s) Results provided in almost any format SDD Federation Server Multiple other data sources

F I R E W A L L F I R E W A L L SDD adapters can co-exist with other types of adapters to System of Records RDBMS Indexes SDD Adapter Social Media Feed Indexes SDD Adapter SDD Federation Server Hadoop Indexes SDD Adapter Mainframe ERP System Indexes SDD Conventional Adapter SDD Adapter SDD Federation Server SDD Federation Server TCP / IP ODBC/ JDBC Driver REST API Etc. Application(s) Salesforce 3 rd Party Adapter

SDD indexes and adapters can be deployed and accessed anywhere, and at any level in multiple combinations Indexes are 100% contiguous, regardless of where or how deployed

How does Software Defined compare with other approaches?

Goal #1: Complete, clean, transformed, standardized and secure data, and master data, for multiple applications Goal #1 Complete, clean, transformed, standardized and secure data, and master data, for multiple applications SDD ETL to a Warehouse Big Lake Conventional Federated Adapters Solr/ Elasticsearch ( )

Goal Goal #2: Near real-time minimize update and query latency #1 Complete, clean, transformed, standardized and secure data, and master data, for multiple applications SDD ETL to a Warehouse Big Lake Conventional Federated Adapters Solr/ Elasticsearch ( ) #2 Near real-time minimize update and query latency ( ) ( ) ( ) ( )

Goal Goal #3: Automation, including workflow and event processing #1 Complete, clean, transformed, standardized and secure data, and master data, for multiple applications SDD ETL to a Warehouse Big Lake Conventional Federated Adapters Solr/ Elasticsearch ( ) #2 Near real-time minimize update and query latency ( ) ( ) ( ) ( ) #3 Automation, including workflow and event processing ( )

Goal Goal #4: Support reporting, BI and analytics, including graph database #1 Complete, clean, transformed, standardized and secure data, and master data, for multiple applications SDD ETL to a Warehouse Big Lake Conventional Federated Adapters Solr/ Elasticsearch ( ) #2 Near real-time minimize update and query latency ( ) ( ) ( ) ( ) #3 Automation, including workflow and event processing ( ) #4 Support reporting, BI and analytics, including graph database ( ) ( ) ( ) ( )

Goal #5: Minimize copies of data Goal #1 Complete, clean, transformed, standardized and secure data, and master data, for multiple applications SDD ETL to a Warehouse Big Lake Conventional Federated Adapters Solr/ Elasticsearch ( ) #2 Near real-time minimize update and query latency ( ) ( ) ( ) ( ) #3 Automation, including workflow and event processing ( ) #4 Support reporting, BI and analytics, including graph database ( ) ( ) ( ) ( ) #5 Minimize copies of data

Goal Goal #6: discovery, metadata repository and data governance #1 Complete, clean, transformed, standardized and secure data, and master data, for multiple applications SDD ETL to a Warehouse Big Lake Conventional Federated Adapters Solr/ Elasticsearch ( ) #2 Near real-time minimize update and query latency ( ) ( ) ( ) ( ) #3 Automation, including workflow and event processing ( ) #4 Support reporting, BI and analytics, including graph database ( ) ( ) ( ) ( ) #5 Minimize copies of data #6 discovery, metadata repository and data governance ( ) ( ) ( )

Goal #7: Write back to data sources Goal #1 Complete, clean, transformed, standardized and secure data, and master data, for multiple applications SDD ETL to a Warehouse Big Lake Conventional Federated Adapters Solr/ Elasticsearch ( ) #2 Near real-time minimize update and query latency ( ) ( ) ( ) ( ) #3 Automation, including workflow and event processing ( ) #4 Support reporting, BI and analytics, including graph database ( ) ( ) ( ) ( ) #5 Minimize copies of data #6 discovery, metadata repository and data governance ( ) ( ) ( ) #7 Write back to data sources

How SDD meets goals for an optimal data architecture Goal #1 Complete, clean, transformed, standardized and secure data, and master data, for multiple applications Software Defined Process source data as building and maintaining indexes and master data, and as reading raw results data Multiple indexes, views, means of access and result formats #2 Near real-time minimize update and query latency Changed data capture High performance, parallel distributed processing almost no load on data sources #3 Automation, including workflow and event processing Index monitoring, REST APIs and workflow integration #4 Support reporting, BI and analytics, including graph database Indexed views, provision highly curated data to analytics, run analytics, and built-in virtual graph database and link analysis #5 Minimize copies of data Can leave and secure data in sources, a Lake or indexes #6 discovery, metadata repository and data governance Use raw indexes for discovery, metadata and combining with IAM and RBAC for data governance from edge/bottom up #7 Write back to data sources Can read as well as insert, delete and update

Software Defined (SDD) Implementation and alignment of use-cases is the key to driving Enterprise IP. Technology prohibits this due to binding of data elements within applications Freeing data to create workflows will dramatically reduce time to market Incrementally developing enterprise use-cases through SDD drives innovation to next-gen path allowing a reinvention of the enterprise Agnostic de-coupling of silo solutions drives speed to market Use-case consumption for any user, on any device, anywhere securely enhances collaboration and productivity

Software Defined (SDD) Process to eliminate upwards of 95% of today s regression issues translates into 50-75% calendar time savings Business logic and code base functions grow incrementally as business dictates Cumulative code growth ensures reuse, optimal performance and agnostic access. You also benefit from globally available logic Zero impact deployments eliminates downtime and simplifies the SDLC Dynamic, intelligent workflows consume new features when live

AI and NLP are the new UI for many applications Leveraging next-gen cognitive rapidly delivers functionality and results Millennial workforce alignment Dramatic decrease in training requirements Dramatic decrease in time to market on features and results Disconnect 3 rd party backend solutions from natural language UI Allow seamless app upgrades by using AI UI, which will interact with both old and new systems

Abstracted 3-tier architecture connected through a Smart Fabric PLATFORM MANAGEMENT SERVICES Ecosystem Solutions SYSTEM OF ENGAGEMENT - APPLICATIONS Line-of-Business Users SYSTEM OF INSIGHTS ANALYTICS WAREHOUSE SYSTEM OF RECORDS INTERNAL / EXTERNAL DATA SOURCES SDD Fabric API Catalog

SDD is the First Paradigm Shift in how data and analytics are managed in a common meta-object framework *Indexes per data source on structured, unstructured and semistructured data that can either store or not store (default) data SYSTEM OF ENGAGEMENT - APPLICATIONS SYSTEM OF INSIGHTS ANALYTICS WAREHOUSE SYSTEM OF RECORDS INTERNAL / EXTERNAL DATA SOURCES Identity Management Access Control Other Corporate Governance Application and Other Corporate Security Standard drivers, APIs, Web/data services, SQL, Spark SQL and other query languages Distributed Management, Including Source Results Read and Write Back INDEXES* Metadata Repository Governance Role-based Access Control API Management Business Rules Discovery Profiling Security Analytics Quality Transformation Standardization Relationships BPM Software Master Management Audit Logs Virtual Graph base Optional Lake Real-time, Incremental or Batch Updated, Centralized or Distributed

The Second Paradigm Shift is the concept of the Analytics Warehouse

The Second Paradigm Shift is the concept of the Analytics Warehouse

An example of an Analytics Warehouse architecture data ingestion/warehouse model

An example of an Analytics Warehouse architecture spanning enterprise systems federated model

Examples of applying SDD and an Analytics Warehouse to healthcare analytics

3rd Party Partner Ecosystem Healthcare Cloud SDD Fabric Healthcare Clinical Network Management 3-Tier Architecture End-End Integrated Application Ecosystem Environment across Continuum of Care SoE Analytical Insights Catalog and Ecosystem Clinical, Financial, Administrative, Operational SoI Enterprise/ Ecosystem-wide Master Management Platform Longitudinal Patient Record SoR Development and IT Operations Support Environment Human Capital Management, Finance, Sales & Marketing AI Driven Operations Automation End-to-End Security

SDD Fabric enables a Longitudinal Patient Record (LPR) view across multiple System of Records, across multiple enterprises Transparent distributed data management layer that plugs-and-plays in existing IT infrastructures Complements and leverages existing IT systems, tools and applications Leave and guard data in sources, copies, e.g., Lake, or stored in indexes a hybrid approach Address upfront data discovery, security, quality, standards, MDM and other data-related processes

Use cases from healthcare that combine data and analytics management Use Cases Applications Clinical Applications Diabetes, Hypertension, Heart Failure, etc. Gaps in Care Predictive Readmissions Management Clinical Wellness Management Operational Management Operational Management Hospital Operational Management Physician Practices Physician Quality Reporting Scores Financial Performance Financial Management Hospital Financial Management Physician Practices Claims Analytics Regulatory Reporting Hospital Value Based Purchasing (HVBP) HEDIS Patient Centered Medical Home Scorecard MU 2 Clinical Quality Measures-Hospitals/Physicians MU2 Usage Scorecard - Physicians ACO Quality Reporting Hospital Outpatient Quality Reporting Patient Similarity Comparative Effectiveness Research Predictive models Chronic disease management Ability to create patient cohorts Cohort Manager/Chronic Condition Management Population Management Population Focus & Population Care

Conclusion of why ETL is no longer King, long live SDD SDD enables a data management paradigm shift SDD supports an analytics management paradigm shift ETL still has value, but not exclusively SDD can greatly enhance, complement and/or replace ETL in the future SDD is more suited than ETL to the new world of: everywhere API data services/catalog Parallel distributed processing Event and workflow processing Near real-time architectures

Thank you A presentation by Brian McCalley of DXC Technology, Glenn Field of SiriusIQ and Gavin Robertson of WhamTech, Inc.

Q&A