Saving ETL Costs Through Data Virtualization Across The Enterprise

Similar documents
ANY Data for ANY Application Exploring IBM Data Virtualization Manager for z/os in the era of API Economy

IBM Data Virtualization Manager for z/os Leverage data virtualization synergy with API economy to evolve the information architecture on IBM Z

IBM DATA VIRTUALIZATION MANAGER FOR z/os

Data Virtualization for the Enterprise

IBM Data Virtualization Manager in Detail + Demo Atlanta DB2 User Group Meeting December 7, 2018

ADABAS & NATURAL 2050+

IBM DB2 Analytics Accelerator Trends and Directions

IBM DB2 Analytics Accelerator

Khadija Souissi. Auf z Systems November IBM z Systems Mainframe Event 2016

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Analytics with IMS and QMF

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP

Rickard Linck Client Technical Professional Core Database and Lifecycle Management Common Analytic Engine Cloud Data Servers On-Premise Data Servers

BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation

@Pentaho #BigDataWebSeries

THINK DIGITAL RETHINK LEGACY

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

zspotlight: Spark on z/os

IBM Data Replication for Big Data

Modern Data Warehouse The New Approach to Azure BI

Fast Innovation requires Fast IT

Certkiller.P questions

The Evolution of Big Data Platforms and Data Science

TECHED USER CONFERENCE MAY 3-4, 2016

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET

Data Analytics using MapReduce framework for DB2's Large Scale XML Data Processing

Accelerating Digital Transformation on Z Using Data Virtualization

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.

BIG DATA COURSE CONTENT

Understanding the latent value in all content

Data Architectures in Azure for Analytics & Big Data

Reliability and Performance with IBM DB2 Analytics Accelerator Version 4.1 IBM Redbooks Solution Guide

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

Availability Digest. Attunity Integration Suite December 2010

Přehled novinek v SQL Server 2016

Stages of Data Processing

Composite Software Data Virtualization The Five Most Popular Uses of Data Virtualization

IBM DB2 Analytics Accelerator: Real-Life Use Cases

What's new and exciting in Tools for DB2 for z/os

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Making Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0. WEBINAR MAY 15 th, PM EST 10AM PST

IBM IMS Tools Keynote

IBM DB2 Analytics Accelerator use cases

Building a Data Strategy for a Digital World

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud

IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store

VOLTDB + HP VERTICA. page

DB2 for z/os Tools Overview & Strategy

Big data easily, efficiently, affordably. UniConnect 2.1

The Never Ending Value of z Systems Focus on Analytics & Big Data

Oracle Big Data Discovery

Optimizing Data Transformation with Db2 for z/os and Db2 Analytics Accelerator

Oracle Big Data Connectors

DQpowersuite. Superior Architecture. A Complete Data Integration Package

Virtuoso Infotech Pvt. Ltd.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Drawing the Big Picture

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

Analytics on z Systems, what s new?

Applying Analytics to IMS Data Helps Achieve Competitive Advantage

The age of Big Data Big Data for Oracle Database Professionals

Data-Intensive Distributed Computing

Evolving To The Big Data Warehouse

Netezza The Analytics Appliance

Introduction to Federation Server

Top Five Reasons for Data Warehouse Modernization Philip Russom

IBM C IBM z Systems Technical Support V6.

Designing a Modern Data Warehouse + Data Lake

Ayush Ganeriwal Senior Principal Product Manager, Oracle. Benjamin Perez-Goytia Principal Solution Architect A-Team, Oracle

IBM Replication Updates: 4+ in 45 The Fillmore Group February A Premier IBM Business Partner

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

An InterSystems Guide to the Data Galaxy. Benjamin De Boe Product Manager

Lambda Architecture for Batch and Stream Processing. October 2018

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

IBM dashdb Local. Using a software-defined environment in a private cloud to enable hybrid data warehousing. Evolving the data warehouse

EMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content

How to Modernize the IMS Queries Landscape with IDAA

HOW TO ACHIEVE REAL-TIME ANALYTICS ON A DATA LAKE USING GPUS. Mark Brooks - Principal System Kinetica May 09, 2017

Cloud Analytics and Business Intelligence on AWS

Prices in Japan (Yen) Oracle Technology Global Price List December 8, 2017

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Capturing Your Changed Data

Comparison of SmartData Fabric with Cloudera and Hortonworks Revision 2.1

Db2 for z/os Gets Agile

DATACENTER SERVICES DATACENTER

Data Warehousing on System z What is available & How to implement

Oracle GoldenGate for Big Data

Realizing the Full Potential of MDM 1

Database Management Systems

Data Analytics at Logitech Snowflake + Tableau = #Winning

IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK

How Insurers are Realising the Promise of Big Data

1. Which programming language is used in approximately 80 percent of legacy mainframe applications?

MetaMatrix Enterprise Data Services Platform

QMF Analytics v11: Not Your Green Screen QMF

BIG DATA ANALYTICS A PRACTICAL GUIDE

Improving Your Business with Oracle Data Integration See How Oracle Enterprise Metadata Management Can Help You

REGULATORY REPORTING FOR FINANCIAL SERVICES

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Data Warehousing in the Age of In-Memory Computing and Real-Time Analytics. Erich Schneider, Daniel Rutschmann June 2014

Transcription:

Saving ETL Costs Through Virtualization Across The Enterprise IBM Virtualization Manager for z/os Marcos Caurim z Analytics Technical Sales Specialist 2017 IBM Corporation

What is Wrong with Status Quo? There is not enough time in the day to move all the data. My mobile users expect to see current data, not yesterday s data.

Current Integration Limitations Movement Using ETL Tools System of Record Staging Server OLTP Files Files Files ETL Server Staging Server Staging Server Warehouse S Q L Reportin g Ad-hoc OLAP Represents ETL inconsistency High latency Complex, high mainframe costs

ETL Drives Up Mainframe Costs ETL costs found in three areas Additional hardware, storage and networking costs Labor involved in managing file transfers Wasted systems cycles (MIPS) IBM study found that to move one terabyte of data, with three derivative copies each day, amortized over a four year period added up to $8,269,335 ETL responsible for consuming 16-18% of total MIPS Clabby Analytics The ETL Problem, October 2013

Virtualizing Movement Cloud Mainframe virtualization enables data structures that were designed independently to be leveraged together, from a single source, in real time, and without complex, costly data movement RDBMS Web/ Mobile Logical Source Big Unstructured

Virtualization Use Cases Faster, easier delivery of modern systems of engagement Need for immediate insight into your customer or business Reduce the cost/complexity of accessing mainframe data Modernization Real-time Analytics Optimization

IBM Virtualization Manager for z/os

Cost Efficient Information Processing Mainframes have multiple processors General purpose processor all processing counts against capacity Specialty Engines Eligible workloads don t count against GPU capacity IBM Virtualization Manager can run 99% of its own processing in the ziip engine Enables mainframe data to be integrated in-place without processing penalty Eligible Workloads Can Run Outside of GPP within ziip GPP ziip

Typical ETL Process Issues with data inconsistency not timely Complex process Prone to errors Costly - high MIPS usage Analytics, Search Warehouse Staging Warehouse Load into Warehouse Transformation of data into compatible formats Adaba s IDMS Natura l IMS CICS Sequenti al Db2 for z/os VSAM extracts from mainframe and non-mainframe data sources Db2 LUW Informi x dashd B Oracle IBM Federation Server SQL Server

Augmenting ETL with Virtualization All data transformations run on ziip specialty engine for significantly reduced MIPS capacity usage SQL JDBC/ODBC/DRDA NoSQL JSON Services SOAP Analytics, Search z/os Connect REST/APIs Design Information delivered in right format, in realtime IBM Virtualization Server for z/os IBM ziip Specialty Engine Combined data delivered to Mapping Caching analytics Map/Reduce Join Query mainframe and Parallel/IO nonmainframe Optimization data Security Monitoring Metadata Adaba s IDMS Natura l IMS CICS Sequenti al Db2 for z/os VSAM Db2 LUW Informi x Derby Oracle IBM Federation Server SQL Server

Augmenting Warehouse via DVS Analytics, Search SQL JDBC/ODBC/DRDA NoSQL JSON Services SOAP z/os Connect REST/APIs Design Warehouse IBM Virtualization Server for z/os IBM ziip Specialty Engine Combined data delivered to Mapping Caching analytics Map/Reduce Query Parallel/IO Optimization Join VSAM with DW data Security Monitoring Metadata Adaba s IDMS Natura l IMS CICS Sequenti al Db2 for z/os VSAM

Complex ETL Script ETL Environment Source System Extract Program Pre-Landing ETL (Flow 1) Landing ETL (Flow2) Staging ETL (Flow 3) Vendor Extract ETL (Flow 4) Vendor Landing ETL (Flow 5) Vendor Updates (Flow 6) Source System Services Environment Hub Key Generation Services Hub Key Generation Services base Environment Vendor Systems Cross-Ref Pre-Landing Landing Staging Landing Staging Cross-Ref Enterprise Exchange Interface

SQL Insert Into Select Statement Web/Mobi le ESB, ETL Analytics, Search Transactional Can replace complex and hard to manage ETL scripts with SQL statement SQL JDBC/ODBC/DR DA NoSQL MongoDB API IBM Virtualization Server for z/os Services SOAP/REST/HT ML Web HTTP Events CDC/ Streams Mapping Caching SQL Insert Into Select Statement Map/Reduce Query Parallel/IO Optimization Design Security Monitoring IBM ziip Specialty Engine Metadata SMF Sys Logs Tape Adaba s IDMS Natura l IMS CICS Sequenti al Db2 for z/os VSAM Big SQL Hadoop Mongo DB Db2 LUW Informi x dashd B Oracle IBM Federation Server SQL Server

Functional Architecture Input Ingest/Transform Persist Analyse Visualise/Interact Transaction ETL Landing Zone Engineering ETL Enterprise Warehouse Visualization Reporting Dashboarding ETL Hadoop Cluster Exploratory Analytics

Functional Architecture Input Ingest/Transform Persist Analyse Visualise/Interact Mainframe Transaction ETL/ELT (if necessary) Analytical LPAR (if necessary) Engineering ETL Enterprise Warehouse Visualization Reporting Dashboarding ETL Hadoop Cluster Exploratory Analytics

Analytics LPAR Architecture Mainframe Analytics LPAR QMF Cognos XXX Distributed Visualization Transactional LPAR SparkSQL IDVM BigSQL Access DB2 IMS DB2 IMS Sharing Distributed DBs Hadoop DB2 Dash PDA IDAA Stores

Ingestion and Integration Component Description: The Integration component focuses on the processes and environments that deal with the capture, qualification, processing, and movement of data in order to prepare it for storage in the Repository Layer, which is subsequently shared with the Analytical and Access applications and systems OLTP LPAR Lake LPAR Integration & Ingestion Distributed Environment DB2 IMS DB2 IMS ETL Tool Hadoop IDAA Sharing IDAA Loader IDVM Spark Integration & Ingestion Existing Cobol CDC apps Stage DB2 Dash PDA IDAA Loader: Load directly into IDAA non DB2 for z/os (IMS, VSAM, Logs, etc). Can accelerate exploration and discovery CDC: Update, if needed, from OLTP DB2 Schema to an OLAP DB2 Schema and also to IDAA (both, OLTP and OLAP) Existing Cobol apps: Several cobol programs already deployed. Leverage to new Lake LPAR to control costs of data movement. Invest on exploration and discovery to reduce total number of those programs Stage and other ETL tools: leverage IDVM or SparkSQL to connect mainframe data when needed, reducing inhouse cobol development dependency. Can be deployed on Linux on mainframe to reduce latency and footprint Load into Hadoop or into DWH, Mart (depend on use case) Z Connector for Hadoop: Accelerate know mainframe data movement to the Hadoop environment

IBM Analytics Banking Student Loan Processing Optimizing ETL to enabling faster loan review and approval Mountains of data to process Poor data quality, complicated by millions of records to process took 12 hours to load Faster Time to Insight accessing more than 7 million records went from 12 hours to less than 13 minutes Improved TCO Complex joins in-memory were performed on the mainframe, with 93% on ziip engine Software IBM Virtualization Manager for z/os The challenge: Student loan processing was taking too long due to poor data quality and huge volumes of student data stored in IMS DB on the customer s z12 mainframe. With IBM Virtualization Manager for z/os accessing more than 7 million IMS records, the cycle went from 12 hours (via ETL) to less than 13 minutes. Complex joins in-memory were performed on the mainframe, with 93% of related processing running on the System z Integrated Information Processor (ziip). The lending institution was able to use real-time insight to processing student loans faster and more accurately improving business efficiency and avoiding regulatory fines.

Unlocking Z for Real-time Business Insight Simple Get transactional access, no data movement Open to all Apps Modern APIs enable access Secure Avoid risk by reducing moving data off Z Systems IBM Virtualization Manager for z/os Fast Exploits Z architecture, including parallelism and in-memory processing Cost Effective Keeps Z costs down with up to 99% ziip offload Non z/os data

IBM Analytics Insurance North American Insurance Firm Modernization to accelerate adding new online customers From days to milliseconds Online account origination went from 3 days to 200 milliseconds, Improved operational efficiency Overcame time delays associated with inefficient batch processes API-enabled IBM Z apps/data Enhanced developer productivity with APIs to actuarial data in IMS DB Software IBM Virtualization Manager for z/os IBM z/os Connect Enterprise Edition The challenge: New online customers at major insurance company had to wait days for confirmation of coverage when adding a new insurance product (motorcycle, boat, RV, etc.). Batch processes associated with their policy management system running on their z13 mainframe contributed to the new product request taking approximately 3 business days to complete. Actuarial data in IMS DB was API-enabled using IBM Virtualization Manager, which allowed developers to incorporated risk calculation and cost estimates into new online service. Online policy origination went from 3 days to 200 milliseconds, and registered 400+ new policies in the first 2 weeks of going live.

IBM Analytics Financial Services Global Financial Services - Real-time, self service analytics for faster insight into customer investment needs 17 million VSAM records Huge data volumes 15 VSAM files concatenated together brought back 17 million records Faster time to insight Enabling portfolio managers to provide timely investment advice Real-time information For business analysts who no longer waited for data to be loaded Software IBM Virtualization Manager for z/os The challenge: Prior to doing analytics, business analysts had to enlist database programmers to create reports from VSAM data residing on the IBM z13 mainframe. Getting mainframe data into the data warehouse involved a complicated, multi-step extraction process that created delays for business analysts. IBM Virtualization Manager enabled real-time access to IMS DB and VSAM data from the online dashboard of the business intelligence application. Analysts can respond faster to business requests for customer insights enabling portfolio managers to use the intelligence to make more relevant, timely investment suggestions to their clients.

Thank You

Backup slides

Runtime flow Sources Transaction Transaction (Mainframe) 1 1.1 Integration Analytical Lake Storage Discovery & 2.1 Exploration ETL 2 Landing Zone 3 4 Enterprise Warehouse (and Marts) Archive 5 Engineering 6 7 Stewardship Discovery Actionable Insight Interactive Workloads Long-Running Workloads 1. Transaction data is extracted on a periodic basis or from operational systems. 1. Mainframe data can be directly access for Discovery & Exploration 2. Mainframe data is extracted based on needs and use case (not all data needed or should be moved) 2. is ingested into the analytics environment using an ETL engine (Stage or BigIntegrate) which generates the technical and operational metadata and stores it in the metadata repository for access during Engineering, Stewardship and Discovery. 3. is placed initially, when needed, in a Landing Zone (Hortonworks) where it can be staged, transformed and integrated. 4. is then loaded into an Enterprise Warehouse (DB2, dashdb, PDA, IDAA) and possibly to downstream marts (IDAA) for reporting, dashboarding and other interactive workloads. 5. As data ages, it is extracted from the Enterprise Warehouse (again using the ETL engine) and loaded into the Archive repository (Hadoop) where it can be accessed for long-running workloads such as exploratory analytics 1. DB2 for z/os transaction historical data can leverage IDAA capabilities to archive data 6. in either location (and its associated metadata) can be accessed for Engineering, modeling etc., using InfoSphere Architect. I can be accessed for Stewardship (curation, adding business metadata etc) and Discovery using the Information Governance Catalog UI. 7. Business Users and Scientists can access data either directly or through virtualization/federation tools such as BigSQL, IDVM. The users then visualize and analyze the data using their favorite tools (Cognos, SPSS, R Studio, QMF, etc.)

Mainframe data sources Traditional sources: The original corporate data sources are still very valuable resources. They are made up of application data (CRM, HR, and other customer data systems), transactional data (sales, events, claims, etc.), systems of record (historical data, reference data, etc.) and third-party data (provided by 3 rd part organizations e.g. census data). DB2 IMS VSAM Other 3 rd party DB Log (SMF,RMF, Midleware) DB2: High performance RDBMS In memory capabilities to even fast performance Exploration of NoSQL capabilities with native support of XML, JSON (up to 540 million transactions per hour arriving through a RESTful web API into DB2) Together, with IDAA, delivers real hybrid transactional analytical processing IMS: High performance NoSQL database (hierarchical). Fast Path High Volume Transaction Processing reaching a sustained average transaction rate of over 117,000 transactions per second on a single IMS instance. VSAM: Virtual Storage Access Method, another NoSQL database on mainframe, extreme performance. DB2 and IMS are based on VSAM. VISA process up to 145k transactions per second Logs: Another very important source data. The logs on mainframe has well defined data structure which can and should be used for analytics

Mainframe Analytical data lake storage Component Description: The overall purpose of the Mainframe Analytical Lake Storage component is for it to be a set of secure data repositories allowing for Discovery and Exploration of real time data, performing Actionable Insight, and utilizing Enhanced Applications, without a need to physically move from it source. Although is not mandatory, it can be use to control mainframe costs and have fine tune workload management DB2 IMS IDAA Sharing allows applications running on more than one DB2 or IMS subsystem to read and write to the same set of data concurrently. Possible architectures includes one DB2 member for transactional workload and one DB2 member as for analytical workload. Avoid unnecessary ETL, start the exploration and discovery right on transactional data without impacting applications DB2: HTAP (Hybrid Transactional/Analytical Processing) Leverage the same infrastructure to run any kind of workload. bases on DB2 are logical objects. It gives the possibility to have a transactional and analytical data model controlled by the same RDBMS. One OLTP application can access analytical data One OLAP application can access transactional data IDAA: Can be used to deploy a data warehouse and or specific data marts directly on the mainframe. IMS, VSAM, and other mainframe data can be loaded directly to be used in temporal data marts. Archive historical DB2 data to free up mainframe storage keeping it accessible

access Component Description: The overall purpose of the Access component is to express the various capabilities needed to interact with the Lake Repository component. The capabilities serve the access needs of data scientists, business analytics, developers, and others that need access to valuable data. virtualization: Describes any approach to data management that allows a user or application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted or where it is physically located. SparkSQL IDVM BigSQL SparkSQL Securely Integrate OLTP and Business Critical, can access almost all type of mainframe data Same distribution, no need for mainframe skills. Same set of applications language can be used: Scala / Python / Java / R / SQL Can be called from BigSQL IBM Virtualization Manager for z/os: The base for several IBM products such as QMF, Spark for z/os, IDAA Loader, etc Virtualized almost all data on mainframe, including 3 rd parties DBs, like Adabas, IDMS Can virtualized BigSQL objects to easier integration with hadoop environments Can also virtualized other distributed data stores BigSQL: Hadoop query engine derived from decades of IBM R&D investment in RDBMS technology, including database parallelism and query optimization. Can access DB2 for z/os directly thru DRDA connection Can access mainframe data thru IDVM Can access mainframe data thru SparkSQL

Access thru SparkSQL, BigSQL, and IDVM Application SparkSQL BigSQL IDVM DB2 IDAA IMS VSAM Other Distributed DBs Hadoop DB2 Dash PDA

Access thru BigSQL Application Distributed application accessing data from several sources eg.: Hadoop, DB2 and VSAM SparkSQL Big SQL calls Spark (using UDF) Big SQL native connection BigSQL IDVM Big SQL JDBC Connection DB2 IDAA IMS VSAM Other Distributed DBs Hadoop DB2 Dash PDA

Access thru IDVM Application Any application (distributed or mainframe) accessing and joining data from several sources eg.: Hadoop, DB2 and IMS SparkSQL JDBC connection to all data on mainframe BigSQL objects can be declared on IDVM to simplify access BigSQL IDVM DB2 IDAA IMS VSAM Other Distributed DBs Hadoop DB2 Dash PDA

Access thru SparkSQL on mainframe Application scientist tasks, leveraging mainframe data: Scala / Python / Java / R / SQL Spark z/os BigSQL IDVM DB2 IDAA IMS VSAM Other Distributed DBs Hadoop DB2 Dash PDA

Ingestion and Integration Component Description: The Integration component focuses on the processes and environments that deal with the capture, qualification, processing, and movement of data in order to prepare it for storage in the Repository Layer, which is subsequently shared with the Analytical and Access applications and systems OLTP LPAR Lake LPAR Integration & Ingestion Distributed Environment DB2 IMS DB2 IMS ETL Tool Hadoop IDAA Sharing IDAA Loader IDVM Spark Integration & Ingestion Existing Cobol CDC apps Stage DB2 Dash PDA IDAA Loader: Load directly into IDAA non DB2 for z/os (IMS, VSAM, Logs, etc). Can accelerate exploration and discovery CDC: Update, if needed, from OLTP DB2 Schema to an OLAP DB2 Schema and also to IDAA (both, OLTP and OLAP) Existing Cobol apps: Several cobol programs already deployed. Leverage to new Lake LPAR to control costs of data movement. Invest on exploration and discovery to reduce total number of those programs Stage and other ETL tools: leverage IDVM or SparkSQL to connect mainframe data when needed, reducing inhouse cobol development dependency. Can be deployed on Linux on mainframe to reduce latency and footprint Load into Hadoop or into DWH, Mart (depend on use case) Z Connector for Hadoop: Accelerate know mainframe data movement to the Hadoop environment