High Speed ETL on Low Budget

Similar documents
1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Call: Datastage 8.5 Course Content:35-40hours Course Outline

SQL Maestro and the ELT Paradigm Shift

Deccansoft Software Services. SSIS Syllabus

White Paper On Data Migration and EIM Tables into Siebel Application

Oracle Database 11g: Administer a Data Warehouse

Oracle9i for e-business: Business Intelligence. An Oracle Technical White Paper June 2001

ELTMaestro for Spark: Data integration on clusters

DQpowersuite. Superior Architecture. A Complete Data Integration Package

<Insert Picture Here> RDBMS-based Coverage Collection and Analysis

Data Integration and ETL with Oracle Warehouse Builder

Oracle Streams. An Oracle White Paper October 2002

Oracle Database 11g: Data Warehousing Fundamentals

Migrating Express Applications To Oracle 9i A Practical Guide

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

Designing your BI Architecture

Page 1. Oracle9i OLAP. Agenda. Mary Rehus Sales Consultant Patrick Larkin Vice President, Oracle Consulting. Oracle Corporation. Business Intelligence

Incremental Updates VS Full Reload

W h i t e P a p e r. Integration Overview Importing Data and Controlling BarTender from Within Other Programs

An Introduction to Oracle Warehouse Builder 10gR2. Paul Narth, Ali El Kortobi Oracle Corporation OLSUG, 23 June 2004

SESUG Paper SD ETL Load performance benchmarking using different load transformations in SAS Data Integration Studio.

Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data

Course Contents: 1 Datastage Online Training

COPYRIGHTED MATERIAL. Contents. Introduction. Chapter 1: Welcome to SQL Server Integration Services 1. Chapter 2: The SSIS Tools 21

Data Validation Option Best Practices

ORACLE TRAINING CURRICULUM. Relational Databases and Relational Database Management Systems

The Truth About Test Data Management & Its Impact on Agile Development

Product Overview. Technical Summary, Samples, and Specifications

IBM WEB Sphere Datastage and Quality Stage Version 8.5. Step-3 Process of ETL (Extraction,

INDEPTH Network. Introduction to ETL. Tathagata Bhattacharjee ishare2 Support Team

Oracle 9i Application Development and Tuning

Introduction to Federation Server

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

Database Programming with PL/SQL

Migrating Oracle Databases To Cassandra

Oracle Database 12c: Use XML DB

Oracle Database: SQL and PL/SQL Fundamentals NEW

Implementing SVM in an RDBMS: Improved Scalability and Usability. Joseph Yarmus, Boriana Milenova, Marcos M. Campos Data Mining Technologies Oracle

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.

Sql Fact Constellation Schema In Data Warehouse With Example

WORK EXPERIENCE: Client: On Time BI, Dallas, TX Project: Sr. BODI Developer

Oracle Data Integration and OWB: New for 11gR2

Ab Initio Training DATA WAREHOUSE TRAINING. Introduction:

Configuring the Oracle Network Environment. Copyright 2009, Oracle. All rights reserved.

Table of Contents. Oracle SQL PL/SQL Training Courses

Partitioning in Oracle Database 10g Release 2. An Oracle White Paper May 2005

DATAWAREHOUSING AND ETL PROCESSES: An Explanatory Research

Was ist dran an einer spezialisierten Data Warehousing platform?

VOLTDB + HP VERTICA. page

Using Oracle Designer 6i to Configuration Management Internet Platform Applications. An Oracle Technical White Paper October 2000

Oracle: From Client Server to the Grid and beyond

Data Mining & Data Warehouse

Key Features. High-performance data replication. Optimized for Oracle Cloud. High Performance Parallel Delivery for all targets

Using Oracle9i Warehouse Builder and Oracle 9i to create OLAP ready Warehouses

Techno Expert Solutions An institute for specialized studies!

Oracle Warehouse Builder 10g Runtime Environment, an Update. An Oracle White Paper February 2004

Data Replication Buying Guide

#mstrworld. Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending. Presented by: Trishla Maru.

Jean-Marc Krikorian Strategic Alliance Director

to-end Solution Using OWB and JDeveloper to Analyze Your Data Warehouse

MUD dling Through Oracle Business Intelligence

Data Warehousing & Big Data at OpenWorld for your smartphone

Design Studio Data Flow performance optimization

INFORMATICA POWERCENTER BASICS KNOWLEDGES

Composite Data Virtualization Composite Data Virtualization Platform Technical Overview

An Oracle White Paper March Oracle Warehouse Builder 11gR2: Feature Groups, Licensing and Feature Usage Management

Dell Microsoft Reference Configuration Performance Results

ETL (Extraction Transformation & Loading) Testing Training Course Content

ISV Migrating to Oracle9i/10g

Product Data Sheet: TimeBase

Multi-Vendor, Un-integrated

New Features Guide Sybase ETL 4.9

Data Warehouse and Data Mining

trelational and DPS Product Overview ADABAS-to-RDBMS Data Transfer trelational Maps It - DPS Pumps It!

Data Mining. Asso. Profe. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of CS (1)

Extending the Scope of Custom Transformations

Comprehensive Guide to Evaluating Event Stream Processing Engines

Unit

What s New in SAS/Warehouse Administrator

TOPLink for WebLogic. Whitepaper. The Challenge: The Solution:

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

Creating Enterprise and WorkGroup Applications with 4D ODBC

Dan Vlamis Vlamis Software Solutions, Inc Copyright 2005, Vlamis Software Solutions, Inc.

Vendor: IBM. Exam Code: P Exam Name: IBM InfoSphere Information Server Technical Mastery Test v2. Version: Demo

D Daaatta W Waaarrreeehhhooouuusssiiinng B I R L A S O F T

Sql Server Interview Questions Answers For Experienced

e-warehousing with SPD

SAS Data Integration Studio 3.3. User s Guide

Agenda Birds Do It: Migrating Forms to Java EE Web A Case Study

C Exam Code: C Exam Name: IBM InfoSphere DataStage v9.1

BI Moves Operational - The Case for High-Performance Aggregation Infrastructure

Two Success Stories - Optimised Real-Time Reporting with BI Apps

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Lookup Transformation in IBM DataStage Lab#12

Best ETL Design Practices. Helpful coding insights in SAS DI studio. Techniques and implementation using the Key transformations in SAS DI studio.

Hyperion Interactive Reporting Reports & Dashboards Essentials

Transformer Looping Functions for Pivoting the data :

August Oracle - GoldenGate Statement of Direction

28 February 1 March 2018, Trafo Baden. #techsummitch

Data Vault Partitioning Strategies WHITE PAPER

Transcription:

High Speed ETL on Low Budget Introduction Data Acquisition & populating it in a warehouse has traditionally been carried out using dedicated ETL tools available in the market. An enterprise-wide Data Warehousing project would often budget a few hundred thousand dollars to invest in an ETL tool like Informatica or DataStage. These functionally rich, best-of-breed tools usually have their own ETL engine to process data along with programmer & user friendly GUI s to design and schedule the jobs. However the underlining principle still remains the same: Use SQL to read data, transform, write to the target using SQL commands. The value-add such tools provide is ability to handle different kinds of data sources (flat files, XML, RDBMS, etc) with a standard interface/connectors, a responsive designer interface and orderly job monitoring screens. But with recent enhancements in SQL and new ETL centric features brought in by the leading database vendors redefine the way to implement such data flows by using database itself as the ETL engine. ETL focused enhancements in Databases Let s take a look at the few recently introduced ETL focused features in the Oracle Database, these would also be found in other RDBMS sooner or later. External Tables Virtual tables accessing external data residing in flat files Oracle 9i s external table feature allows data in external data sources like flat files to be exposed in the database like any other data residing in a regular table. This external table acts as virtual table which can be queried, joined directly using the full power of SQL, PL/SQL or JAVA without requiring the external data to be loaded in the DB. No DML operation or indexing is possible on this table. Figure 1 Concept of an External Table [2]

Table functions Pipelined & Parallel execution of transformations implemented in PL/SQL, C or Java In a typical ETL process data extracted from a source system passes through a sequence of transformations before being loaded into the Data Warehouse. When results of a transformation are too large to fit in the memory, they must be staged intermediately in tables or flat files. This data is then read and processed as input to the next transformation in the sequence. Such scenarios can however be done without requiring the use of staging tables, which interrupt the data flow through various transformation steps. Table function is a function that can take a set of rows as input and produce a set of rows as output. These set of rows are processed iteratively in subsets, thus enabling a pipelined mechanism to stream these subset results from one transformation to the next before the first operation is finished. Further more they can be processed transparently in parallel. Figure 2 Table Function Example [1] Upsert Functionality Conditional Updates & Inserts in one pass Very often an Upsert logic needs to be incorporated in ETL. Insert a given a row if it s entirely new, if not then update the previously existing row in the target table. This functionality is delivered by the MERGE statement in SQL providing the ability to Update or Insert a row conditionally in a table. The MERGE syntax is now part of the SQL:2003 standard and is supported by most Database vendors in their latest releases. Case Study Let s instantiate this idea with a case study using these ETL focused features to drive home the point. Figure 1 & 2 are screen shots of a mapping done with an industry standard ETL tool. It does the following Source data coming in flat files had to undergo some transformation before being Upserted (Insert if not exists, else Update) to the target DW tables.

Figure 3 Mapping for Step STG1 STG2 Figure 4 Mapping for Step STG2 DW In line with environment setup at the customer, the loading process is phased in 3 steps: Flat file STG 1 STG2 DW Step 1: We used SQL* Loader to load STG1 i.e. Staging Area 1 (Some tool constraints prevented us from using the ETL tool to do this), alternatively an ETL mapping can do this too. Data from files is brought in as is to a table. Step 2: STG1 to STG2 (Staging Area 2) is a mapping with transformations, followed by Upsert functionality on STG2 table. Step 3: STG2 to DW mapping is a direct move to DW target with a current date lookup. We tried to achieve the same using native SQL & PL/SQL features and were able to cover all three steps & execute it in one command! Here s how we address each step of the ETL data flow using the alternative approach -

Step 1: Use External tables to read from flat files, no need to load in a physical table in STG1. Instead of having a populated STG1 table, just use Select * from <External table> Step 2: The Select from external table will serve as an input to a Pipelined Table function. Any complex transformations can be coded in the table function. This table function spits out a table of records as soon as they are processed. Step 3: Use a MERGE SQL statement for Upsert functionality. This SQL command calls the table function (which reads from External table, performs transformations on the fly and delivers a transformed data stream for the upsert) and updates records that already exist, if not then inserts. Figure5 Schematic showing staged data flow v/s data flow with on-the-fly transformations [2] As depicted in Figure 5, the contemporary process of using SQL*loader, mappings & Staging tables to populate the warehouse can also be achieved using a simple SQL. Below is the MERGE statement which delivers the entire process in one go - MERGE INTO dwtr_fusion_cto_bundle d -- Upsert command USING (Select * From TABLE(fusion_feed)) f -- Calling a table function ON (d.heart_order_number = f.heart_order_number AND d.heart_item_number = f.heart_item_number) WHEN MATCHED THEN UPDATE SET d.load_date = d.load_date WHEN NOT MATCHED THEN

INSERT (heart_order_number, heart_item_number, high_level_item_number, bundle_id, load_date, bu_load_date) VALUES (f.heart_order_number, f.heart_item_number, f.high_level_item_number, f.bundle_id, f.load_date, f.bu_load_date); Conclusion Using the SQL versatility and new functionality introduced, any ETL task can be delivered at a fraction of the cost. Tools like Oracle Warehouse Builder and Microsoft DTS are based on the same fundamentals. They use the database as their processing engine and capitalize on the native features to offer a commendable ETL tool, not to mention they cost much less than the more established ones. Gain in performance is another major plus. One, there is no data movement through a separate engine involved, second parallel / bulk loading and upserts can be implemented more promptly to your advantage. Finally, there are no overheads in terms of setting up separate hardware or developer training. One can get started as long as the team is familiar with writing SQL. Guerilla ETL scores well in terms of cost, performance & readiness. References [1] Oracle Data Warehousing Guide [2] Hermann Baer, ETL Processing within Oracle9i, Oracle White Paper