Benefits of Automating Data Warehousing

Similar documents
This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

DATA MINING AND WAREHOUSING

Full file at

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar

The Data Organization

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database.

TIM 50 - Business Information Systems

Q1) Describe business intelligence system development phases? (6 marks)

TIM 50 - Business Information Systems

Business Intelligence An Overview. Zahra Mansoori

Data Warehousing and OLAP Technologies for Decision-Making Process

Rocky Mountain Technology Ventures

DATA MINING TRANSACTION

The Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu

Data Warehousing. Adopted from Dr. Sanjay Gunasekaran

Information Management course

BUILD BETTER MICROSOFT SQL SERVER SOLUTIONS Sales Conversation Card

Teradata Aggregate Designer

Data warehouse architecture consists of the following interconnected layers:

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

D Daaatta W Waaarrreeehhhooouuusssiiinng B I R L A S O F T

CHAPTER 3 Implementation of Data warehouse in Data Mining

Oracle Database 11g: Data Warehousing Fundamentals

Data Warehouse and Mining

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

Question Bank. 4) It is the source of information later delivered to data marts.

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

The strategic advantage of OLAP and multidimensional analysis

Cognos also provides you an option to export the report in XML or PDF format or you can view the reports in XML format.

Topics covered 10/12/2015. Pengantar Teknologi Informasi dan Teknologi Hijau. Suryo Widiantoro, ST, MMSI, M.Com(IS)

Chapter 3. Databases and Data Warehouses: Building Business Intelligence

Data Warehouse and Data Mining

Business Intelligence and Decision Support Systems

Overview of Reporting in the Business Information Warehouse

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

OLAP Introduction and Overview

The Data Organization

Data Warehouse and Data Mining

Chapter 6 VIDEO CASES

After completing this course, participants will be able to:

SAS Data Integration Studio 3.3. User s Guide

Data Mining Concepts & Techniques

by Prentice Hall

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives

Data Warehouses and Deployment

PREFACE INTRODUCTION MULTI-DIMENSIONAL MODEL. Dan Vlamis, Vlamis Software Solutions, Inc.

Training 24x7 DBA Support Staffing. MCSA:SQL 2016 Business Intelligence Development. Implementing an SQL Data Warehouse. (40 Hours) Exam

Exploiting Key Answers from Your Data Warehouse Using SAS Enterprise Reporter Software

Oracle Retail Data Model

Operating Systems: MS DOS, WINDOWS 3.1, 95, 98, WINDOWS 2000, WINDOWS NT & XP, UNIX various server platforms.

Microsoft SQL Server Training Course Catalogue. Learning Solutions

MS-55045: Microsoft End to End Business Intelligence Boot Camp

COURSE 20466D: IMPLEMENTING DATA MODELS AND REPORTS WITH MICROSOFT SQL SERVER

The EDW: An Overview For Foothill-De Anza Community College District

KORA. Business Intelligence An Introduction

Cisco Gains Real-time Visibility in the Business with SAP HANA

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

SAS 9.4 Intelligence Platform: Overview, Second Edition

COMMUNICATION COMPUTER ENGINEERING Discovery Engineering, Volume 2, Number 8, November Engineering

Chapter 3: AIS Enhancements Through Information Technology and Networks

Data Mining & Data Warehouse

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

Microsoft End to End Business Intelligence Boot Camp

DSS based on Data Warehouse

Dan Vlamis Vlamis Software Solutions, Inc Copyright 2005, Vlamis Software Solutions, Inc.

Data Integration and ETL with Oracle Warehouse Builder

Data Mining and Warehousing

IBM dashdb Local. Using a software-defined environment in a private cloud to enable hybrid data warehousing. Evolving the data warehouse

Microsoft SQL Server More solutions. More value. More reasons to buy.

Construction and Real Estate. Improve system performance and data security with SQL Server

Oracle Communications Data Model

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

Data Stage ETL Implementation Best Practices

QM Chapter 1 Database Fundamentals Version 10 th Ed. Prepared by Dr Kamel Rouibah / Dept QM & IS

REPORTING AND QUERY TOOLS AND APPLICATIONS

Evolving To The Big Data Warehouse

Data Warehousing. Overview

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa

Symantec Enterprise Vault

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

Business Process Engines in Distributed Knowledge Management Systems

Dr.G.R.Damodaran College of Science

Recently Updated Dumps from PassLeader with VCE and PDF (Question 1 - Question 15)

Question: 1 What are some of the data-related challenges that create difficulties in making business decisions? Choose three.

WORK EXPERIENCE: Client: On Time BI, Dallas, TX Project: Sr. BODI Developer

MCSA SQL SERVER 2012

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders

FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION

Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g

CHAPTER 1: INTRODUCTION

Transcription:

Benefits of Automating Data Warehousing Introduction Data warehousing can be defined as: A copy of data specifically structured for querying and reporting. In most cases, the data is transactional data pulled from a number of different applications in the enterprise, but it can be any type of data. The goal of a data warehouse is to provide an enterprise with high quality information that they can use to make tactical and strategic decisions. Typical data warehouse operations deal with extremely large amounts of data. Missing one step in the process, or executing a step at the wrong time, can result in a significant amount of wasted processing time, or in the worst case scenario, bad data. This White Paper explains how UC4 can be used to schedule all software used in the data warehouse process and increase efficiency in the data warehouse process. Introduction... 2 The Process... 2 The Problem... 2 The Solution... 2 Benefits... 3 Background... 3 Automating the Data Warehouse with UC4... 4 Benefits to the Data Warehouse Team... 4 Definitions... 5 About UC4 Software... 7 Copyright 2008 UC4 Software GmbH (UC4), all rights reserved. The materials in this publication are protected by copyright and/or other intellectual property laws. Any unauthorized use of the materials in this publication can result in an infringement of these laws. Unless expressly permitted, the copying of information or documents from this publication, in any form, without the prior written permission of UC4 is prohibited. 1

Introduction Most often the data needed for the data warehouse is extracted from a variety of systems and applications to a central relational repository. It is not unusual for a corporation s applications to reside on different platforms such as mainframes, UNIX, AS/400, and Windows. These applications store data in different databases or nonrelational data for-mats. The data most often is stored in different formats using different codes to mean the same thing. For example, in application A gender may be stored as Male, but in application B gender is stored as M, and in application C it's stored as 01. A data warehouse converts the data to a uniform format and provides the ability to view, compare, and correlate information at an enterprise level. Data Warehouses can be used as application feeds. Often a data warehouse provides content for information displayed on portal pages. Additionally, data warehouses can be the back end to custom developed EIS (enterprise information systems). The Process There are five major steps in a data warehouse process. The first three steps consist of the building and subsequent loading of the data warehouse. This is commonly referred to as Extraction, Transformation, and Loading (ETL). The last two steps consist of creating data marts and reporting. 1. Extracting data from applications. In this step the data is extracted from the application s data structure and most often placed in a temporary data structure or occasionally left in a file. 2. Transforming, or staging, the data (getting it in the proper format, often referred to as transforming ). In the transformation step, the different data formats are reconciled. For example, all data formats for gender are transformed to either Male, Female, Other. 3. Loading the transformed data into the data warehouse. 4. Creating data marts from the data warehouse. 5. Reporting (turning the data into information) The steps are discussed in greater detail later in this paper under the section titled Background. The Problem Running a data warehouse requires coordinating many operations across many applications, databases, and systems. In a large company, this might be as many as 20 discreet operations that must be performed in the correct sequence, at the correct time, and under the correct conditions. Typical data warehouse operations deal with extremely large amounts of data. It is not unusual for a large organization to load gigabytes of data nightly. Often data warehouses grow to sizes large than one terabyte. Missing one step in the process, or executing a step at the wrong time, can result in a significant amount of wasted processing time, or in the worst case scenario, bad data. The Solution UC4 Workload Automation Suite is tailor made for automating the data warehousing process. Each step in the process becomes a module in one or more chains. The chains can be run on a set schedule, and individual steps in the chains can be executed based on a wide range of criteria such as the number of transactions or the number of records in the data warehouse. UC4 is application aware and can ensure that application processes are run prior to data extractions. Data extracts, often long processes, can be checked periodically to ensure that the data being extracted is in the correct format. Notification of nightly failures can be automated. Complex recovery sequences can be automated. Reports based on the data can be run and distributed automatically. Load on the system can be leveled using UC4 queues. 2

UC4 can interface with ETL (Extract, Transform, Load) tools such as Informatica. Using UC4 ensures that ETL transactions occur at the appropriate times. ETL tool processes can be scheduled with knowledge of application processes and backups. UC4 can run the entire data warehouse environment, including all applications, scripts, and tools. It provides a single point of control. Benefits There are many benefits to using UC4 Workload Automation Suite to control a data warehouse: UC4 ensures the proper execution sequence, resulting in fewer errors, fewer recoveries, and higher data warehouse availability. If scripts are being used to manage the E-T-L process, UC4 can replace the scripts, reducing coding and maintenance time. Fewer scripts mean easier maintenance and less dependence on complex scripts that may have been written by one person who has since moved on to another position. The data warehouse runs more efficiently because UC4 initiates actions based on the state of the application and data warehouse databases, not on set times. Lag time between steps is eliminated. End users get reports online within minutes of the data being available because UC4 scheduled the generation of the data and knows when it can run the reports. System resources are maximized because UC4 schedules all applications used in the data warehouse environment, levels the load on the system, and prevents the system from being over loaded. These benefits hold for new, as well as existing, data warehousing projects. Background A data warehouse is a very large database containing data from many sources in an enterprise. The data warehouse is at the center of an entire data collection and distribution process. This process: Extracts the data from enterprise sources such as ERP, CRM, and financial systems. These sources often reside on different platforms such as UNIX, Windows, mainframe, and AS400. Transforms and cleanses the data so that data coming from the different sources is standardized or normalized. In the transformation step, the different data formats are reconciled. For example, all data formats for gender are transformed to either ( Male, Female, Other ). Places the normalized data in a staging area. Loads the data into the actual data warehouse database and operational data store. The data warehouse is usually a relational database running on a UNIX platform. Loads the data via OLAP tools into small databases called data marts for fast data access by specific groups within the enterprise. The data is usually stored in data cubes. A data cube presents data based on two or more attributes. For example, it might summarize sales by region (two attributes) or sales by region by product (three attributes). Products such as Cognos PowerPlay create optimized cubes of data for distribution. Data marts are often distributed to remote sites for faster access. Reports are usually generated from the data marts rather than the data warehouse. Turns the data into information using various reporting tools. 3

The process is illustrated in the figure below. Data Warehouse Environment Source Data Staging Data Warehouse Data Marts (periodic data for strategic decisions) Reporting Enterprise ERP System Sales CRM System Financials Other systems External E-T-L E-T-L Tools Data Junction Data Stage Informatica Ascential SAS Data Mirror AbInitio Sagent Sripting and SQL*Plus Normalized Data E-T-L Data Warehouse (gigabytes to terabytes) Operational Data Store (real-time data for tactical decisions) E-T-L Marketing Accounting Other Reporting Tools Crystal Reports Brio Oracle Reports Actuate Cognos Business Objects MicroStrategy Hyperion Figure A. The data warehouse environment includes data sources, a staging area that aggregates and transforms the data, a data warehouse that stores the transformed data, data marts that provide quick access to subsets of the data, and reporting tools for turning the data into information. Automating the Data Warehouse with UC4 UC4 Workload Automation Suite can be used to schedule all software used in the data warehouse process: Enterprise applications such as ERP (Enterprise Resource Planning), CRM (Customer Resource Management), Financials, and other internal applications Backup tools E-T-L tools OLAP tools Reporting tools In addition, UC4 can increase efficiency in the data warehouse process by: Eliminating the lag time between processing steps. Balancing the load placed on the servers by the enterprise applications, E-T-L tools, and reporting tools. Eliminating the scripts used to archive data cubes in the data marts. Distributing output generated by the reporting tools. Benefits to the Data Warehouse Team Building a data warehouse generally requires a team of experts. The different roles and the benefits UC4 Workload Automation Suite brings to each role are described below. One person on the team may function in more than one role. Project Manager. This person will oversee the progress and be responsible for the success of the datawarehousing project. UC4 benefits the Project Manager during the implementation and maintenance phase of the project. The UC4 script-less approach reduces implementation times and provides the framework for promotion to the production environment. During the maintenance phase, UC4 assists the Project Manager in managing the processes and notifications of failure. The UC4 Graphical Analysis Package can be used to analyze the impact of new data source additions. The project manager often reports to the CIO or the UC4. 4

DBA (Database Analyst). This person is responsible for keeping the database running smoothly. Additional tasks are to plan and execute a backup/recovery plan, as well as performance tuning. UC4 benefits the DBA during the implementation and maintenance phase of the project. UC4 allows the data warehouse to be enterprise aware including backups. The DBA can get notifications of failed processes, automate recovery, and proactively check the health of database structures prior to loading. Technical Architect. This person is responsible for developing and implementing the overall technical architecture of the data warehouse, from the backend hardware/software to the client desktop configurations. UC4 benefits the Technical Architect by providing a single point of control for enterprise processes including the data warehouse processes. ETL Developer. This person is responsible for planning, developing, and deploying the extraction, transformation, and loading routine for the data warehouse. ETL tools provide their own scheduling system. The ETL developers may be resistant to moving control of ETL processes to an enterprise scheduler. However, the benefits they receive are offloading the production support to an operations group, notifications of failures, and automation of recovery procedures. E-T-L is often done with scripts. Developers are usually more interested in developing the scripts than maintaining them. UC4 eliminates the need for scripts, doing away with the maintenance issue. Front End Developer. This person is responsible for developing the front-end, whether it be client-server or over the Web. Front-end developers can kick off processes from their front-end application. OLAP Developer. This person is responsible for the development of OLAP cubes. OLAP tools provide their own scheduling system. The OLAP developers may be resistant to moving control of OLAP processes to an enterprise scheduler. However the benefits they receive is offloading the production support to an operations group, notifications of failures, automation of recovery procedures, and automation of the cube distribution. Data Modeler. This person is responsible for taking the data structure that exists in the enterprise and modeling it into a schema that is suitable for OLAP analysis. They architect the transformation steps. Definitions Following are definitions for the most common terms used when discussing data warehouses. The definitions are extracted from a variety of Web sites. Data Warehousing. An enterprise-wide system that extracts data from different servers and platforms and collects it in a single database. A data warehouse effectively consolidates data from multiple sources. Data Warehouse. A very large database with special tools used to extract and cleanse data from operational systems and to analyze data. Data from various online transaction processing (OLTP) applications and other sources is selectively extracted and organized on the data warehouse database for use by analytical applications and user queries. Data warehousing emphasizes the capture of data from diverse sources for useful analysis and access, but does not generally start from the point-of-view of the end user or knowledge worker who may need access to specialized, sometimes local databases. Data Mart. A focused subset of a data warehouse that deals with a single area of data and is organized for quick analysis. The data is usually stored in data cubes. The emphasis of a data mart is on meeting the specific demands of a particular group of knowledge users in terms of analysis, content, presentation, and ease-of-use. Users of a data mart can expect to have data presented in terms that are familiar. Data Cube. A data cube presents data based on two or more attributes. For example, it might summarize sales by region (two attributes) or sales by region by product (three attributes). Information in data marts is usually stored in data cubes. Data cubes are usually archived for historical reporting. The archiving process is almost always scripted. UC4 can replace the archiving scripts. 5

Data Mining. Sorting through data to identify patterns and establish relationships. Data mining parameters include: Association - looking for patterns where one event is connected to another event Sequence or path analysis - looking for patterns where one event leads to another later event Classification - looking for new patterns (May result in a change in the way the data is organized but that's ok) Clustering - finding and visually documenting groups of facts not previously known Forecasting - discovering patterns in data that can lead to reasonable predictions about the future Decision Support System (DSS). An application that analyzes business data and presents it so users can make business decisions more easily. It is an "informational application" (in distinction to an "operational application" that collects the data in the course of normal business operation). Typical information that a decision support application might gather and present would be: Comparative sales figures between one week and the next Projected revenue figures based on new product sales assumptions The consequences of different decision alternatives, given past experience in a context that is described ETL. Stands for extract, transform, and load. The extract function reads data from a specified source database and extracts a desired subset of data. The transform function works with the acquired data - using rules or lookup tables, or creating combinations with other data, to convert it to the desired state. The load function is used to write the resulting data (either all of the subset or just the changes) to a target database, which may or may not previously exist. ETL can be used to acquire a temporary subset of data for reports or other purposes. A more permanent data set may be acquired for other purposes such as: the population of a data mart or data warehouse conversion from one database type to another migration of data from one database or platform to another Metadata. Metadata is data that describes other data. Specifically, it is files or databases with information about another database s attributes, structure, processing, or changes. It can describe any characteristic of the data such as the content or its quality. The normalized data created in the staging database can be considered metadata. A lower level technology example of metadata is columns in a table. A DBA may create a person table: DATA METADATA (COLUMN NAME AND PHYSICAL LAYOUT) Debbie Hamel Name, Character 20 Metadata Evangelist Title, Character 24 VA State Code, Character 2 11111 Employee Id, Numeric OLAP (online analytical processing). In data warehousing, OLAP is used to generate the data marts. OLAP makes it easy for a user to selectively extract and view data from different points of view. In short, it provides custom reporting. For example, a user can request a report that shows all of a company's beach ball products sold in Florida in the month of July, compares revenue figures with those for the same products in September, and then compares other product sales in Florida in the same time period. To facilitate this kind of analysis, OLAP data is stored in a multidimensional database called a cube. Whereas a relational database can be thought of as two-dimensional, a multidimensional database considers each data 6

attribute (such as product, geographic sales region, and time period) as a separate "dimension." OLAP software can locate the intersection of dimensions (all products sold in the Eastern region above a certain price during a certain time period) and display them. Attributes such as time periods can be broken down into sub-attributes. Time Fencing. The practice of scheduling data extraction based on estimated completion times of prerequisite processes. For example, a financials application may run from 1 A.M. to 2:30 A.M. every night. A process to extract data from this application may be scheduled for 3 A.M. to ensure the financials application process is complete. This often can result in the extraction process being run against bad data, wasting time and resources. UC4 can eliminate time fencing by making the extraction contingent on the successful completion of the financials process. About UC4 Software UC4 Software is a leading provider of workload automation and IT process optimization solutions that ensure core business processes and enterprise information systems run faster, more accurately and without interruption. More than 1,600 companies worldwide have successfully enhanced application processing performance and improved IT efficiency using UC4 s business acceleration solutions. Customers include American Suzuki Motor Corporation, Cadbury, ebay, Eastman Kodak, General Electric, Mattel, McGraw Hill, Panasonic, Robert Bosch, Sun Microsystems, Symantec, T-Systems and Verizon. For more information, please visit www.uc4.com. Contact UC4 Software In the US at (877) 464-7300 (toll-free) In Europe at +43 2233 77880 www.uc4.com info@uc4.com 7