Leveraging Mainframe Data in Hadoop
|
|
- Edward Ross
- 5 years ago
- Views:
Transcription
1 Leveraging Mainframe Data in Hadoop Frank Koconis- Senior Solutions Consultant Glenn McNairy- Account Executive
2 Agenda Introductions The Mainframe: The Original Big Data Platform The Challenges of Ingesting and Using Mainframe Data on Hadoop Mainframe-Hadoop Data Integration Goals Mainframe-to-Hadoop Migration / Integration Options Syncsort and DMX-h Live DMX-h Demo Q & A 2
3 The Mainframe: The Original Big Data Platform Mainframes handle over 70% of all OTLP transactions They have a long, proven track record- over 60 years! They are reliable- operating continuously with zero downtime for years They are secure- access is tightly restricted and managed 3
4 Mainframes Still Process Vast Amounts of Vital Data Top 25 World Banks 9 of World s Top Insurers 23 of Top 25 US Retailers 4
5 But now our organization is implementing Hadoop Hadoop is the new Big Data platform The goal is for the Hadoop cluster to be the single central location for ALL data (the Data Lake ) According to Wikipedia, this should be the single store of all data in the enterprise ranging from raw data to transformed data * So you need to bring in all of the organization s data sources And that includes the mainframe The mainframe has vital data that you cannot afford to ignore when building your data lake *-
6 Enterprise Data Lake Without Mainframe Data = Missed Opportunity
7 The Challenges of Using Mainframe Data in Hadoop Mainframe knowledge and skills are difficult to find The mainframe workforce is aging rapidly Knowledge of existing designs and code may no longer be available Young developers almost never learn mainframe skills Security and connectivity issues Mainframes have a highly controlled security environment Installation of data-extraction utilities or programs may be forbidden The mainframe is mission-critical, so no action can be taken that could cause downtime Mainframe data looks VERY different from data on Windows, Linux or UNIX This is so important, it deserves its own slide 7
8 The Biggest Challenge: Mainframe Data Formats Mainframe files are not like files in Windows, Linux or UNIX There is no such thing as a delimited text file on the mainframe File types include fixed-record, variable-record, VSAM and others The mainframe uses EBCDIC rather than ASCII, but it s not that simple Text values are EBCDIC, but many numeric values are not Simple EBCDIC-to-ASCII conversion WILL NOT WORK Mainframe files can have VERY complex record structures Records may be very wide, containing hundreds or thousands of fields Records are usually not flat They often have sub-records and arrays (COBOL OCCURS groups) These may be nested many levels deep Often, a range of bytes in a record is used in several different ways (COBOL REDEFINES ) This means that the data looks different between records in the same file(!) Record layouts are defined by COBOL copybooks; here are examples 8
9 COBOL Copybook Example #1 Simple example of a COBOL copybook which defines a record layout: ** SALES ORDERS FILE 01 SLS-ORD-FILE. 05 CUSTOMER-ACCOUNT-NUMBER PIC S9(9) COMP ORDER-NUMBER PIC X(10). 05 ORDER-DETAILS. 10 ORDER-STATUS PIC X(1). These fields are packed decimal (not EBCDIC!) 10 ORDER-DATE PIC X(10). So EBCDIC-to-ASCII conversion would corrupt them! 10 ORDER-PRIORITY PIC X(15). 10 CLERK PIC X(15). 10 SHIPMENT-PRIORITY PIC S9(4) COMP TOTAL-PRICE PIC 9(7)V.99 COMP COMMENT-COUNT PIC 9(2). 10 COMMENT PIC X(80) OCCURS 0 to 99 TIMES DEPENDING ON COMMENT-COUNT. This is a variable-length array. The number of elements depends on the value of COMMENT-COUNT. So the size of this array will vary from record to record. 9
10 COBOL Copybook Example #2 (more complex) 01 LN-HST-REC-LHS. 05 HST-REC-KEY-LHS. 10 BK-NUM-LHS PIC S9(5) COMP APP-LHS PIC S9(3) COMP LN-NUM-LHS PIC S9(18) COMP LN-SRC-LHS PIC X. 10 LN-SRC-TIE-BRK-LHS PIC S9(5) COMP EFF-DAT-LHS PIC S9(9) COMP PST-DAT-LHS PIC S9(9) COMP PST-TIM-LHS PIC S9(7) COMP TRN-COD-LHS PIC S9(5) COMP SEQ-NUM-LHS PIC S9(5) COMP LN-HST-REC-DTL-LHS. 10 VLI-LHS PIC S9(4) COMP. 10 HST-REC-DTA-LHS. 15 INP-SRC-COD-LHS PIC S9(3) COMP TRN-TYP-IND-LHS PIC X. 15 BAT-NUM-LHS PIC S9(7) COMP BAT-TIE-BRK-LHS PIC X(3). 15 BAT-ITM-NUM-LHS PIC X(9). 15 TML-NUM-LHS PIC X(9). 15 OPR-ID-LHS PIC X(8). 15 HST-ADL-IND-LHS PIC X(1). 15 HST-REV-IND-LHS PIC X(1). 15 TRN-AMT-LHS PIC S9(9)V99 COMP HST-DES-LHS PIC X(25). 15 CUR-PRC-DAT-LHS PIC S9(9) COMP REF-NUM-LHS PIC X(3). 15 INT-FEE-FLG-LHS PIC X(1). 15 UDF-L01-LHS PIC X. 15 PMT-HLD-DAY-LHS PIC S9(3) COMP AUH-NUM-LHS PIC S9(5) COMP CUR-LN-BAL-LHS PIC S9(9)V99 COMP ITM-CNT-LHS PIC S9(2) COMP PYF-COF-REA-COD-LHS PIC X(3). 10 HST-TRN-ADL-DTA-LHS PIC X(240). Continues in right-hand column There are several different ways that data may be stored in this 240-byte area Continued from left-hand column 10 HST-TRN-RDF-1-LHS REDEFINES HST-TRN-ADL-DTA-LHS. 15 HST-TRN-DTA-1-LHS OCCURS 20 TIMES. 20 SPR-TRN-COD-LHS PIC S9(5) COMP SPR-TRN-REF-LHS PIC X(3). 20 SPR-TRN-AMT-LHS PIC S9(9)V99 COMP HST-TRN-RDF-2-LHS REDEFINES HST-TRN-ADL-DTA-LHS. 15 HST-TRN-DTA-2-LHS. 20 OLD-NMN-DTA-LHS PIC X(40). 20 NEW-NMN-DTA-LHS PIC X(40). 20 DAT-TO-DSB-LHS PIC S9(9) COMP RPT-BK-NUM-LHS PIC S9(5) COMP RPT-APP-LHS PIC S9(3) COMP RPT-LN-NUM-LHS PIC S9(18) COMP CMB-PMT-PTY-LHS PIC S9(3) COMP HST-TRN-RDF-3-LHS REDEFINES HST-TRN-ADL-DTA-LHS. 15 HST-TRN-DTA-3-LHS. 20 HST-OLD-RT-LHS PIC SV9(5) COMP HST-NEW-RT-LHS PIC SV9(5) COMP HST-TRN-RDF-4-LHS REDEFINES HST-TRN-ADL-DTA-LHS. 15 HST-TRN-DTA-4-LHS. 20 VSI-PMT-AMT-LHS PIC S9(7)V99 COMP VSI-INT-AMT-LHS PIC S9(7)V99 COMP VSI-TRM-LHS PIC S9(3) COMP INS-REF-NUM-LHS PIC X(3). 20 STR-DAT-VSI-LHS PIC S9(9) COMP HST-TRN-RDF-5-LHS REDEFINES HST-TRN-ADL-DTA-LHS. 15 HST-TRN-DTA-5-LHS. 20 NUM-MO-EXT-LHS PIC S9(3) COMP CLC-EXT-FEE-AMT-LHS PIC S9(5)V99 COMP EXT-REA-LHS PIC X(1). 10 HST-TRN-RDF-6-LHS REDEFINES HST-TRN-ADL-DTA-LHS. 15 HST-TRN-DTA-6-LHS OCCURS 11 TIMES. 20 ASD-BK-NUM-LHS PIC S9(5) COMP ASD-APP-LHS PIC S9(3) COMP ASD-LN-NUM-LHS PIC S9(18) COMP PMT-AMT-LHS PIC S9(9)V99 COMP
11 Mainframe-Hadoop Data Integration Goals 1) Making mainframe data available and usable on the cluster Interpretation and conversion of mainframe data formats Data validation and cleansing Integration of mainframe data with non-mainframe sources Use of mainframe data for data warehousing and BI 2) Reducing mainframe costs for storage and/or CPU Low-cost archival or backup of mainframe data in its native format Processing mainframe data on the cluster (yes- it can be done!) 11
12 Mainframe-to-Hadoop Migration / Integration Options So, what tools that can be used for mainframe-hadoop integration? The free open source tools that come with Hadoop Open-source conversion code generators: JRecord and LegStar Mainframe-based migration tools Legacy-ETL vendors Syncsort DMX-h Let s look at the capabilities of each of these 12
13 Integration Option: Open-source Hadoop Tools Standard Hadoop tools are used to convert mainframe data to ASCII delimited-text format and process it Often the obvious choice because these tools come with Hadoop Steps to integrate ONE mainframe data file: 1) Copy the file from the mainframe to edge node (using FTPS or similar tool) 2) Execute custom program (usually Java) to de-compose complex record structures and convert mainframe data types to delimited text file(s) and write to HDFS 3) Delete copy of mainframe file on edge node 4) Execute custom data-validation/cleansing process using MapReduce or Spark on the cluster (normally Java or Hive) 5) Execute custom MapReduce or Spark process to integrate or load into final target (Data Lake, RDBMS, NoSQL database, etc.) 13
14 Integration Option: JRecord and LegStar These are open-source code generators for file format conversion JRecord Uses CopybookLoader class to interpret COBOL record layouts LegStar Developer must use its Cobol Transformer Generator to create a COBOL-to-XML translator, then call that translator in his/her program Steps to integrate ONE mainframe data file: 1) Copy the file from the mainframe to edge node (using FTPS or similar tool) 2) Execute custom Java program to convert mainframe file by calling methods of CopybookLoader class (JRecord) or calling file-specific COBOL-to-XML translator (LegStar) and write to HDFS LegStar only: Convert XML output to delimited text file(s) 3) Continue with step #3 on previous slide 14
15 Open-source Options: Pros and Cons The one advantage of these options are that they are free Ironically, the primary disadvantage of these free tools is cost Development effort is very high Very large amount of custom coding is required Custom program needed for each source file which cannot be re-used Lack of support Difficult and expensive to find, hire and retain skilled developers Complex mainframe record types are a challenge Standard Hadoop tools: No easy way to handle complex records JRecord: The Java method calls can get very tricky LegStar: The COBOL Transformer Generator has limits Not future-proof A Java program is written for a specific execution framework such as MapReduce or Spark- what will you do when another one comes? 15
16 Integration Option: Mainframe-based Tools Migration tools that run in z-linux on the mainframe system Able to ingest and convert mainframe file formats from z/os Results are written to HDFS or a database Advantages: Does not stage data on edge node Disadvantages: Data validation and data quality checks require custom code Integration with other data sources requires custom code Conversion process runs on mainframe, not commodity hardware 16
17 Integration option: Legacy-ETL Vendors Many legacy ETL vendors now offer Hadoop versions Able to write read mainframe files and write to HDFS Primary advantage is existing skill set of ETL developers The devil you know Disadvantages Very high cost May have difficulty with very complex mainframe record structures Require dedicated metadata repository Single point of failure Becomes a performance bottleneck Do not process natively on the cluster Some work only on the edge node Those that work on the cluster are code generators (Java or Hive) Performance and scalability are limited 17
18 The Best Option: Syncsort s DMX-h Create complete mainframe-hadoop integration solutions, including data validation and integration with other sources Easy-to-use development GUI; no coding Very short learning curve Supports very complex mainframe record structures Native execution on cluster (NO code generation!) Superior performance Runs on all major Hadoop distributions Future-proof : Run ETL jobs on MapReduce, Spark or a future framework with no changes So let s find out more about the company Syncsort and DMX-h 18
19 Who is Syncsort? Syncsort is a leading Big Data company that has been in the high volume data business for over 45 years. Syncsort has successfully transformed its business model from the mainframe era to the age of Hadoop. Syncsort developed DMX which benefits from the algorithms and coding efficiencies developed from its mainframe heritage. 19
20 Syncsort Products Mainframe Solutions Linux/UNIX & Windows Hadoop Solutions High-performance Sort for System z ziip Offload for Copy Hadoop Connectivity for Mainframe MFX Gold-standard sort technology for over four decades saving customers millions each year over competitive sort solutions. High-performance ETL SQL Analysis & Migration ETL for Business Intelligence Mainframe Re-hosting DMX Full-featured data integration software that helps organizations extract, transform and load more data in less time, with fewer resources and less cost Hadoop ETL ETL for Business Intelligence Mainframe-Hadoop Integration DMX-h A smarter approach to Hadoop ETL: easier to develop, faster, lower-cost and future-proof 20
21 DMX-h Installation Architecture Development GUI is installed on Windows workstations To execute on the cluster, GUI sends request to DMX-h agent on edge node Hadoop Cluster Windows Workstation Edge Node (Linux) Cluster Data Nodes DMX-h Job Editor DMX-h Task Editor DMX-h agent DMX-h Engine DMX-h Engine DMX-h Engine DMX-h Engine DMX-h Engine DMX-h Engine DMX-h Engine DMX-h Engine DMX Engine is installed on Windows workstation AND edge node AND all cluster nodes, allowing job execution anywhere 21
22 DMX-h Mainframe-Hadoop Integration Features Mainframe file conversion and processing Fixed-record, variable-record and VSAM files Mainframe DB2 tables EBCDIC text and mainframe numeric types (COMP- types) Complex record structures, nested to any depth REDEFINES, OCCURS and OCCURS DEPENDING ON) Secure transfer from mainframe using FTPS and Connect:Direct Support for mainframe file compression, saving storage and time No need to stage data on the edge node Ability to store and process mainframe data in HDFS in its native format, without conversion(!), when desired Easy integration of mainframe data with other sources 22
23 How Easy is it to Interpret a Mainframe File? I ll demonstrate using my laptop The use case: We have been given a mainframe file and the COBOL copybook containing the record layout. The only 2 things that we have been told are that it is a fixed-record file and that the record size is
24 Use DMX-h to Easily Integrate Mainframe Data With 24
25 Mainframe-Hadoop Integration Use Cases Getting and interpreting the data (with no staging!) Reading from mainframe Conversion from mainframe formats (when desired) Data validation and cleansing Writing to cluster target Processing and data integration Joins and lookups to cluster and non-cluster sources Normalization & Aggregation Publishing and exporting Load external data warehouses (Oracle, Teradata, DB2, SQL Server, etc.) Efficiently generate data extracts for BI users Generate native files for Tableau and QlikView Storing and processing data in mainframe-native format Only DMX-h can do this!- more info later 25
26 Use Case: Mainframe data ingestion A DMX-h job running in the edge node* can connect to both HDFS and an external data source (such as the mainframe). This uses no disk space on the edge node! No limit on file size! This also works for any external source or database, even if it is remote. The source file can even be compressed. Format conversion and data-validation can be done within the same job. *- Can also be done using ANY node on the cluster, if network connectivity allows 26
27 Processing in the Cluster Using DMX-h Once data is in the cluster, additional DMX-h jobs can transform it The developer defines the operations to be performed Join, lookup, aggregate, filter, reformat, etc. There is no need to know the details of MapReduce or Spark DMX-h Intelligent Execution (IX) automatically runs the jobs on the cluster DMX-h jobs run natively on all cluster nodes No code generation! The DMX engine is installed on all nodes More efficient than Hive and other ETL tools which generate Java code Cluster nodes work concurrently, making the process highly scalable 27
28 DMX-h Intelligent Execution on Hadoop DMX-h has a feature called Intelligent Execution (IX) which automatically runs ETL jobs on the Hadoop cluster The DMX engine is installed on all nodes in the cluster, so the transformations run natively, with no extra code generation step IX works when the job runs, not at design time It currently supports MapReduce and Spark It could support other execution frameworks in the future This will require no changes to your DMX-h jobs in production So this means that the SAME DMX-h job can run On your Windows laptop (useful during development for unit testing) On an edge node or any single cluster node On the cluster using MapReduce On the cluster using Spark 28
29 Processing Native-Mainframe Data on Hadoop (!) Using DMX-h, it is actually possible to store and process mainframe data on Hadoop in its original native-mainframe format (!) DMX-h can even write mainframe-format target files No other tool can do this! Sometimes this is a great idea; for example, you can Use HDFS to archive mainframe datasets (MUCH cheaper than DASD) Because the data is 100% unchanged, it will pass any auditing requirement Quickly move mainframe datasets to Hadoop Sometimes you do not have time or resources for a conversion project The data can be moved, unchanged, and converted later You may not immediately know which data fields will need to be used Transform the native-mainframe data using MapReduce or Spark The results can even be moved back to the mainframe and used there! This allows you to offload CPU from the mainframe, reducing MIPS cost. The bottom line is that DMX-h can convert your mainframe data or work with it in its native form, whichever makes sense for you 29
30 DMX-h Live Demo So let s see it actually work using some mainframe data 30
31 DMX-h: Superior Performance and Easy Development Study by Principled Technologies for Dell Development comparison using DMX-h and open-source Hadoop tools Three different ETL processes (see table below) Open-source jobs were built by an experienced Hadoop developer DMX-h jobs were built by an entry-level developer with a few days of DMX-h training, and beat the performance of the open-source jobs on the same cluster: ETL Process Job Execution Time (minutes) Open-source DMX-h DMX-h Advantage Fact Dimension Load with Type-2 SCD 36:39 30:11 18% Data Validation 15:45 6:15 60% Mainframe File Integration 5:51 4:48 18% And DMX-h development was much quicker: Open source jobs developed by experienced developer: 8.4 days DMX-h jobs developed by entry-level developer: 3.8 days (54% less!) 31
32 Resources Syncsort Frank Koconis- Senior Solutions Consultant Glenn McNairy- Account Executive Development Comparison by Dell and Principled Technologies determined that DMX-h enables Easier and Faster Development Lower Development Cost Better Performance These are links to the actual reports from the study JRecord LegStar 32
33
Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET
SOLUTION SHEET Syncsort DMX-h Simplifying Big Data Integration Goals of the Modern Data Architecture Data warehouses and mainframes are mainstays of traditional data architectures and still play a vital
More informationSyncsort Incorporated, 2016
Syncsort Incorporated, 2016 All rights reserved. This document contains proprietary and confidential material, and is only for use by licensees of DMExpress. This publication may not be reproduced in whole
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationMicrosoft Analytics Platform System (APS)
Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft About.me @two_under Senior Program Manager 9 years at Microsoft Visual
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationIBM Data Virtualization Manager for z/os Leverage data virtualization synergy with API economy to evolve the information architecture on IBM Z
IBM for z/os Leverage data virtualization synergy with API economy to evolve the information architecture on IBM Z IBM z Analytics Agenda Big Data vs. Dark Data Traditional Data Integration Mainframe Data
More informationBring Context To Your Machine Data With Hadoop, RDBMS & Splunk
Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may
More informationFrom Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019
From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways
More informationWhat is Gluent? The Gluent Data Platform
What is Gluent? The Gluent Data Platform The Gluent Data Platform provides a transparent data virtualization layer between traditional databases and modern data storage platforms, such as Hadoop, in the
More informationSaving ETL Costs Through Data Virtualization Across The Enterprise
Saving ETL Costs Through Virtualization Across The Enterprise IBM Virtualization Manager for z/os Marcos Caurim z Analytics Technical Sales Specialist 2017 IBM Corporation What is Wrong with Status Quo?
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationSAS Data Integration Server
FACT SHEET SAS Data Integration Server A complete solution designed to meet your data integration needs What does SAS Data Integration Server do? SAS Data Integration Server is a powerful, configurable
More informationELTMaestro for Spark: Data integration on clusters
Introduction Spark represents an important milestone in the effort to make computing on clusters practical and generally available. Hadoop / MapReduce, introduced the early 2000s, allows clusters to be
More informationSecurity and Performance advances with Oracle Big Data SQL
Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,
More informationCopyright 2016 Datalynx Pty Ltd. All rights reserved. Datalynx Enterprise Data Management Solution Catalogue
Datalynx Enterprise Data Management Solution Catalogue About Datalynx Vendor of the world s most versatile Enterprise Data Management software Licence our software to clients & partners Partner-based sales
More informationOracle GoldenGate for Big Data
Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines
More informationCONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationNatQuery The Data Extraction Solution For ADABAS
NatQuery The Data Extraction Solution For ADABAS Overview...2 General Features...2 Integration to Natural / ADABAS...5 NatQuery Modes...6 Administrator Mode...6 FTP Information...6 Environment Configuration
More informationTalend Big Data Sandbox. Big Data Insights Cookbook
Overview Pre-requisites Setup & Configuration Hadoop Distribution Download Demo (Scenario) Overview Pre-requisites Setup & Configuration Hadoop Distribution Demo (Scenario) About this cookbook What is
More informationPerform scalable data exchange using InfoSphere DataStage DB2 Connector
Perform scalable data exchange using InfoSphere DataStage Angelia Song (azsong@us.ibm.com) Technical Consultant IBM 13 August 2015 Brian Caufield (bcaufiel@us.ibm.com) Software Architect IBM Fan Ding (fding@us.ibm.com)
More informationCICS VSAM Transparency
Joe Gailey Senior IT Specialists Client Technical Specialist for CICS z/os Tools 10 th May 2013 CICS VSAM Transparency AGENDA Business Issue IBM s Solution How CICS VT Works (Deep Dive) Conclusions / Questions
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More informationIBM Data Replication for Big Data
IBM Data Replication for Big Data Highlights Stream changes in realtime in Hadoop or Kafka data lakes or hubs Provide agility to data in data warehouses and data lakes Achieve minimum impact on source
More informationData warehousing on Hadoop. Marek Grzenkowicz Roche Polska
Data warehousing on Hadoop Marek Grzenkowicz Roche Polska Agenda Introduction Case study: StraDa project Source data Data model Data flow and processing Reporting Lessons learnt Ideas for the future Q&A
More informationOracle Exadata: The World s Fastest Database Machine
10 th of November Sheraton Hotel, Sofia Oracle Exadata: The World s Fastest Database Machine Daniela Milanova Oracle Sales Consultant Oracle Exadata Database Machine One architecture for Data Warehousing
More informationBest practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP
Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP 07.29.2015 LANDING STAGING DW Let s start with something basic Is Data Lake a new concept? What is the closest we can
More informationVOLTDB + HP VERTICA. page
VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics
More informationASG WHITE PAPER DATA INTELLIGENCE. ASG s Enterprise Data Intelligence Solutions: Data Lineage Diving Deeper
THE NEED Knowing where data came from, how it moves through systems, and how it changes, is the most critical and most difficult task in any data management project. If that process known as tracing data
More informationDelving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture
Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases
More information@Pentaho #BigDataWebSeries
Enterprise Data Warehouse Optimization with Hadoop Big Data @Pentaho #BigDataWebSeries Your Hosts Today Dave Henry SVP Enterprise Solutions Davy Nys VP EMEA & APAC 2 Source/copyright: The Human Face of
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationProcessing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.
Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most
More informationData Lake Based Systems that Work
Data Lake Based Systems that Work There are many article and blogs about what works and what does not work when trying to build out a data lake and reporting system. At DesignMind, we have developed a
More informationIncremental Updates VS Full Reload
Incremental Updates VS Full Reload Change Data Capture Minutes VS Hours 1 Table of Contents Executive Summary - 3 Accessing Data from a Variety of Data Sources and Platforms - 4 Approaches to Moving Changed
More informationData in the Cloud and Analytics in the Lake
Data in the Cloud and Analytics in the Lake Introduction Working in Analytics for over 5 years Part the digital team at BNZ for 3 years Based in the Auckland office Preferred Languages SQL Python (PySpark)
More informationSCASE STUDYS. Migrating from MVS to.net: an Italian Case Study. bizlogica Italy. segui bizlogica
SCASE STUDYS Migrating from MVS to.net: an Italian Case Study bizlogica Italy executive summary This report describes how BIZLOGICA helped a large Corporation to successful reach the objective of saving
More informationModernizing Business Intelligence and Analytics
Modernizing Business Intelligence and Analytics Justin Erickson Senior Director, Product Management 1 Agenda What benefits can I achieve from modernizing my analytic DB? When and how do I migrate from
More informationCapturing Your Changed Data
Capturing Your Changed Data with the CONNX Data Synchronization Tool Table of Contents Executive Summary 1 Fulfilling a Need with Minimal Investment 2 Departmental Reporting Servers 3 Data Migration 4
More informationCertkiller.P questions
Certkiller.P2140-020.59 questions Number: P2140-020 Passing Score: 800 Time Limit: 120 min File Version: 4.8 http://www.gratisexam.com/ P2140-020 IBM Rational Enterprise Modernization Technical Sales Mastery
More informationNext Generation DWH Modeling. An overview of DWH modeling methods
Next Generation DWH Modeling An overview of DWH modeling methods Ronald Kunenborg www.grundsatzlich-it.nl Topics Where do we stand today Data storage and modeling through the ages Current data warehouse
More informationAn InterSystems Guide to the Data Galaxy. Benjamin De Boe Product Manager
An InterSystems Guide to the Data Galaxy Benjamin De Boe Product Manager Analytics 3 InterSystems Corporation. All rights reserved. 4 InterSystems Corporation. All rights reserved. 5 InterSystems Corporation.
More informationTalend Big Data Sandbox. Big Data Insights Cookbook
Overview Pre-requisites Setup & Configuration Hadoop Distribution Download Demo (Scenario) Overview Pre-requisites Setup & Configuration Hadoop Distribution Demo (Scenario) About this cookbook What is
More informationBuilding a Data Lake with Legacy Data
CHECKLIST REPORT NOVEMBER 2016 Building a Data Lake with Legacy Data By Krish Krishnan Sponsored by: NOVEMBER 2016 TDWI CHECKLIST REPORT Building a Data Lake with Legacy Data By Krish Krishnan TABLE OF
More informationFast Innovation requires Fast IT
Fast Innovation requires Fast IT Cisco Data Virtualization Puneet Kumar Bhugra Business Solutions Manager 1 Challenge In Data, Big Data & Analytics Siloed, Multiple Sources Business Outcomes Business Opportunity:
More informationSurvey of Oracle Database
Survey of Oracle Database About Oracle: Oracle Corporation is the largest software company whose primary business is database products. Oracle database (Oracle DB) is a relational database management system
More informationWhen, Where & Why to Use NoSQL?
When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),
More informationPro Tech protechtraining.com
Course Summary Description This course provides students with the skills necessary to plan, design, build, and run the ETL processes which are needed to build and maintain a data warehouse. It is based
More informationThe Reality of Qlik and Big Data. Chris Larsen Q3 2016
The Reality of Qlik and Big Data Chris Larsen Q3 2016 Introduction Chris Larsen Sr Solutions Architect, Partner Engineering @Qlik Based in Lund, Sweden Primary Responsibility Advanced Analytics (and formerly
More informationCapturing Your Changed Data With the CONNX Data Synchronization Tool. by Dann Corbit, Senior Software Engineer
Capturing Your Changed Data With the CONNX Data Synchronization Tool by Dann Corbit, Senior Software Engineer October 2004 Capturing Your Changed Data With the CONNX Data Synchronization Tool by Dann Corbit,
More informationData Validation Option Best Practices
Data Validation Option Best Practices 1993-2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationTopic 1, Volume A QUESTION NO: 1 In your ETL application design you have found several areas of common processing requirements in the mapping specific
Vendor: IBM Exam Code: C2090-303 Exam Name: IBM InfoSphere DataStage v9.1 Version: Demo Topic 1, Volume A QUESTION NO: 1 In your ETL application design you have found several areas of common processing
More informationNew Approaches to Big Data Processing and Analytics
New Approaches to Big Data Processing and Analytics Contributing authors: David Floyer, David Vellante Original publication date: February 12, 2013 There are number of approaches to processing and analyzing
More informationData sharing and transformation in real time. Stephan Leisse Solution Architect
Data sharing and transformation in real time Stephan Leisse Solution Architect stephan.leisse@visionsolutions.com Today s Businesses Have Multiple Databases Source: Vision Solutions 2017 State of Resilience
More informationSpotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data
Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing
More informationC Exam Code: C Exam Name: IBM InfoSphere DataStage v9.1
C2090-303 Number: C2090-303 Passing Score: 800 Time Limit: 120 min File Version: 36.8 Exam Code: C2090-303 Exam Name: IBM InfoSphere DataStage v9.1 Actualtests QUESTION 1 In your ETL application design
More informationMetaSuite : Advanced Data Integration And Extraction Software
MetaSuite Technical White Paper March, 2000 A Minerva SoftCare White Paper MetaSuite : Advanced Data Integration And Extraction Software WP-FPA-101998 Content CAPITALIZE ON YOUR VALUABLE LEGACY DATA 3
More informationInnovatus Technologies
HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String
More informationEvolving To The Big Data Warehouse
Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from
More informationComparison of SmartData Fabric with Cloudera and Hortonworks Revision 2.1
Comparison of SmartData Fabric with Cloudera and Hortonworks Revision 2.1 Page 1 of 11 www.whamtech.com (972) 991-5700 info@whamtech.com August 2018 Page 2 of 11 www.whamtech.com (972) 991-5700 info@whamtech.com
More informationHyperconverged Fabric
Use Case - Remote and Branch Office IT HiveIO Inc. 2018 All rights reserved Empower your Remote and Edge IT Landscapes with Hive FabricTM Remote and Branch Offices (ROBO) present a big challenge in terms
More informationPODIUM DATA SOURCE OVERVIEW DIFFERENTIATORS ARCHITECTURE & COMPONENTS DATA SOURCES EXECUTION & BEHAVIOR APPENDIX DATA SOURCE MANAGEMENT MODULE
PODIUM DATA SOURCE OVERVIEW DATA SOURCE MANAGEMENT MODULE DIFFERENTIATORS PODIUM IS A PLATFORM ADVANCED DATA INGESTION FIELD- LEVEL VISIBILITY SECURITY DATA VALIDATION AND PROFILING ARCHITECTURE & COMPONENTS
More informationDesigning your BI Architecture
IBM Software Group Designing your BI Architecture Data Movement and Transformation David Cope EDW Architect Asia Pacific 2007 IBM Corporation DataStage and DWE SQW Complex Files SQL Scripts ERP ETL Engine
More informationEMC Celerra CNS with CLARiiON Storage
DATA SHEET EMC Celerra CNS with CLARiiON Storage Reach new heights of availability and scalability with EMC Celerra Clustered Network Server (CNS) and CLARiiON storage Consolidating and sharing information
More informationAlexander Klein. #SQLSatDenmark. ETL meets Azure
Alexander Klein ETL meets Azure BIG Thanks to SQLSat Denmark sponsors Save the date for exiting upcoming events PASS Camp 2017 Main Camp 05.12. 07.12.2017 (04.12. Kick-Off abends) Lufthansa Training &
More informationOracle Data Integrator 12c: Integration and Administration
Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 67863102 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive
More informationProgress DataDirect For Business Intelligence And Analytics Vendors
Progress DataDirect For Business Intelligence And Analytics Vendors DATA SHEET FEATURES: Direction connection to a variety of SaaS and on-premises data sources via Progress DataDirect Hybrid Data Pipeline
More informationPrivate Cloud Database Consolidation Alessandro Bracchini Sales Consultant Oracle Italia
Private Cloud Database Consolidation Alessandro Bracchini Sales Consultant Oracle Italia Private Database Cloud Business Drivers Faster performance Resource management Higher availability Tighter security
More informationAzure Data Factory VS. SSIS. Reza Rad, Consultant, RADACAD
Azure Data Factory VS. SSIS Reza Rad, Consultant, RADACAD 2 Please silence cell phones Explore Everything PASS Has to Offer FREE ONLINE WEBINAR EVENTS FREE 1-DAY LOCAL TRAINING EVENTS VOLUNTEERING OPPORTUNITIES
More informationBuilding Next- GeneraAon Data IntegraAon Pla1orm. George Xiong ebay Data Pla1orm Architect April 21, 2013
Building Next- GeneraAon Data IntegraAon Pla1orm George Xiong ebay Data Pla1orm Architect April 21, 2013 ebay Analytics >50 TB/day new data 100+ Subject Areas >100 PB/day Processed >100 Trillion pairs
More informationNatCDC/ NatCDCSP The Change Data Capture Solution For ADABAS
NatCDC/ NatCDCSP The Change Data Capture Solution For ADABAS Overview...2 Processing Overview... 3 Features... 3 Benefits... 4 NatCDC for Data Warehousing...6 Integration with Extraction, Transformation
More informationOracle Big Data. A NA LYT ICS A ND MA NAG E MENT.
Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem
More informationHow Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,
How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS
More informationADABAS & NATURAL 2050+
ADABAS & NATURAL 2050+ Guido Falkenberg SVP Global Customer Innovation DIGITAL TRANSFORMATION #WITHOUTCOMPROMISE 2017 Software AG. All rights reserved. ADABAS & NATURAL 2050+ GLOBAL INITIATIVE INNOVATION
More informationA Review Paper on Big data & Hadoop
A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College
More information<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure
MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure Mario Beck (mario.beck@oracle.com) Principal Sales Consultant MySQL Session Agenda Requirements for
More information5 Fundamental Strategies for Building a Data-centered Data Center
5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse
More informationDQpowersuite. Superior Architecture. A Complete Data Integration Package
DQpowersuite Superior Architecture Since its first release in 1995, DQpowersuite has made it easy to access and join distributed enterprise data. DQpowersuite provides an easy-toimplement architecture
More informationData sources. Gartner, The State of Data Warehousing in 2012
data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system in IT is changing. Gartner, The State of Data Warehousing
More informationTeradata Analyst Pack More Power to Analyze and Tune Your Data Warehouse for Optimal Performance
Data Warehousing > Tools & Utilities Teradata Analyst Pack More Power to Analyze and Tune Your Data Warehouse for Optimal Performance By: Rod Vandervort, Jeff Shelton, and Louis Burger Table of Contents
More informationData Virtualization for the Enterprise
Data Virtualization for the Enterprise New England Db2 Users Group Meeting Old Sturbridge Village, 1 Old Sturbridge Village Road, Sturbridge, MA 01566, USA September 27, 2018 Milan Babiak Client Technical
More informationOracle Data Integrator 12c: Integration and Administration
Oracle University Contact Us: +34916267792 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive data integration platform
More informationReducing Costs and Risk with Enterprise Archiving
Reducing Costs and Risk with Enterprise Archiving Erno Rorive Staff System Engineer Information Intelligence Group 1 Key Challenges Enterprise Archiving Solution Meeting the Challenges Case Study Summary
More informationAb Initio Training DATA WAREHOUSE TRAINING. Introduction:
Ab Initio Training Introduction: Ab Initio primarily works with the best server-client model. It is considered to be the fourth generation platform, when it comes to data manipulation, data analysis and
More informationData Management Glossary
Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative
More informationData integration made easy with Talend Open Studio for Data Integration. Dimitar Zahariev BI / DI Consultant
Data integration made easy with Talend Open Studio for Data Integration Dimitar Zahariev BI / DI Consultant dimitar@zahariev.pro @shekeriev Disclaimer Please keep in mind that: 2 I m not related in any
More informationTECHED USER CONFERENCE MAY 3-4, 2016
TECHED USER CONFERENCE MAY 3-4, 2016 Bruce Beaman, Senior Director Adabas and Natural Product Marketing Software AG Software AG s Future Directions for Adabas and Natural WHAT CUSTOMERS ARE TELLING US
More informationCloud Computing & Visualization
Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International
More informationStreaming Integration and Intelligence For Automating Time Sensitive Events
Streaming Integration and Intelligence For Automating Time Sensitive Events Ted Fish Director Sales, Midwest ted@striim.com 312-330-4929 Striim Executive Summary Delivering Data for Time Sensitive Processes
More informationCOBOL-IT Compiler Suite
COBOL-IT Compiler Suite Enterprise Edition COBOL-IT Compiler Suite Enterprise Edition is an Enterprise COBOL Compiler Suite that is highly adapted to the needs of Enterprises with Mission Critical COBOL
More informationAccessibility Features in the SAS Intelligence Platform Products
1 CHAPTER 1 Overview of Common Data Sources Overview 1 Accessibility Features in the SAS Intelligence Platform Products 1 SAS Data Sets 1 Shared Access to SAS Data Sets 2 External Files 3 XML Data 4 Relational
More informationNormalized Relational Database Implementation of VSAM Indexed Files
Normalized Relational Database Implementation of VSAM Indexed Files Note: this discussion applies to Microsoft SQL Server, Oracle Database and IBM DB2 LUW. Impediments to a Normalized VSAM Emulation Database
More informationActual4Test. Actual4test - actual test exam dumps-pass for IT exams
Actual4Test http://www.actual4test.com Actual4test - actual test exam dumps-pass for IT exams Exam : 000-N20 Title : IBM Rational Enterprise Modernization Technical Sales Mastery Test v1 Vendors : IBM
More informationData Synchronization Data Replication Data Migration Data Distribution
Data Synchronization Data Replication Data Migration Data Distribution The right data in the right place at the right time. tcvision...is a cross-system solution for the timely, bidirectional data synchronization
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationOracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data
Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous
More informationAchieving Horizontal Scalability. Alain Houf Sales Engineer
Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches
More information