Data Warehouse and Data Mining

Similar documents
Data Warehouse and Data Mining

Data Warehouse and Data Mining

The Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing

Data Warehouse and Data Mining

Data Mining Concepts & Techniques

Data Warehouse and Data Mining

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

TIM 50 - Business Information Systems

Sql Fact Constellation Schema In Data Warehouse With Example

CHAPTER 3 Implementation of Data warehouse in Data Mining

ETL and OLAP Systems

Data Warehouse and Data Mining

TIM 50 - Business Information Systems

Data Warehouse. Asst.Prof.Dr. Pattarachai Lalitrojwong

Oracle 1Z0-515 Exam Questions & Answers

Teradata Aggregate Designer

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

Data Warehousing Conclusion. Esteban Zimányi Slides by Toon Calders

An Overview of Data Warehousing and OLAP Technology

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

Data Warehouse and Data Mining

Call: SAS BI Course Content:35-40hours

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders

Appliances and DW Architecture. John O Brien President and Executive Architect Zukeran Technologies 1

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

Data Warehouse and Data Mining

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

Data Warehouse and Mining

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus

ETL TESTING TRAINING

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

Dr.G.R.Damodaran College of Science

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?

DATA WAREHOUING UNIT I

Information Management course

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Processing of Very Large Data

OLAP Introduction and Overview

Data warehouse architecture consists of the following interconnected layers:

Fig 1.2: Relationship between DW, ODS and OLTP Systems

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

REPORTING AND QUERY TOOLS AND APPLICATIONS

Database Vs. Data Warehouse

DATA MINING AND WAREHOUSING

Data Warehousing and OLAP Technologies for Decision-Making Process

collection of data that is used primarily in organizational decision making.

STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS. By: Dr. Tendani J. Lavhengwa

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Evolution of Database Systems

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa

Data Mining & Data Warehouse

Oracle Database 11g: Data Warehousing Fundamentals

Data Warehousing & OLAP

Decision Support Systems

CS 1655 / Spring 2013! Secure Data Management and Web Applications

MOLAP Data Warehouse of a Software Products Servicing Call Center

Information Integration

Adnan YAZICI Computer Engineering Department

Data Warehousing and OLAP

Managing Data Resources

Best Practices - Pentaho Data Modeling

After completing this course, participants will be able to:

CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News!

Topics covered 10/12/2015. Pengantar Teknologi Informasi dan Teknologi Hijau. Suryo Widiantoro, ST, MMSI, M.Com(IS)

Managing Data Resources

Warehousing. Data Mining

Chapter 3. Databases and Data Warehouses: Building Business Intelligence

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Proceedings of the IE 2014 International Conference AGILE DATA MODELS

SAS Data Integration Studio 3.3. User s Guide

D Daaatta W Waaarrreeehhhooouuusssiiinng B I R L A S O F T

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Chapter 18: Data Analysis and Mining

Data Mining. Associate Professor Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology

InfoSphere Warehouse V9.5 Exam.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

Rocky Mountain Technology Ventures

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

Data Warehouse and Data Mining

Data Warehouses and OLAP. Database and Information Systems. Data Warehouses and OLAP. Data Warehouses and OLAP

Enterprise Informatization LECTURE

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A.

SQL Server Analysis Services

Big Data 13. Data Warehousing

On-Line Application Processing

Business Intelligence and Decision Support Systems

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database.

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

A Multi-Dimensional Data Model

Drawing the Big Picture

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 14 - Data Warehousing and Column Stores

5-1McGraw-Hill/Irwin. Copyright 2007 by The McGraw-Hill Companies, Inc. All rights reserved.

DSS based on Data Warehouse

Data Warehousing Introduction. Toon Calders

Transcription:

Data Warehouse and Data Mining Lecture No. 03 Architecture of DW Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

Basic Architecture Architecture of DW

Data Warehouse Architecture

Data Warehouse Architecture

Operational Source systems These are the operational systems of record that capture the transactions of the business. These systems are outside the data warehouse which do not have control over contents and format of the data The source systems maintain little historical data These systems generate operation data that is detailed, current and subject to change

Data Staging Area Data staging area can be divided into three phases Extraction (E) Transformation (T) Loading (L) Extraction: It means reading and understanding the source data and copying the data needed for the data warehouse into staging area for further manipulation (i.e. transformation)

Data Staging Area Loading: Loading refers to populating of data warehouse with data that has been extracted from operational systems. There are two types of loads, which generally take place in data warehouse environment: Initial load Incremental load

Data Staging Area Transformation: The transformation phase applies a series of rules or functions to the extracted/ loaded data. This may include some or all of the following: Select only certain columns to load (or if you prefer, null columns not to load) Translate coded values Derive a new calculated value (e.g. sale_amount = qty * unit_price) Denormalization in order to fit the Dawarehouse Schema Summarize multiple rows of data (e.g. total sales for each region)

Data Staging Area The Data Staging Area Is both a storage and process area (the ETL process) It represents everything that happens between the operational source system and the data presentation area The key architectural requirement for data staging area is that it is off-limits to business users and does not provide query and presentation services should be accessible only to skilled professionals

ETL versus ELT ETL (The traditional approach): ETL (Extract, transform, and load) is a process in data warehousing that involves: Extracting data from outside sources transforming it to fit business needs, and ultimately loading it into the data warehouse ELT (The Teradata Approach): ELT (Extract, Load and Transform) strategy extracts and loads the data into a Teradata Database first, then uses the power and performance of the Teradata Warehouse to perform the transformation

Data Presentation Area Extended Relational DBMS (ROLAP servers) data stored in RDB star-join schemas support SQL extensions (Cube) Index structures (bitmap, join) Multidimensional DBMS (MOLAP servers) data stored in arrays (n-dimensional array) direct access to array data structure poor storage utilization, especially when the data is sparse

Data Presentation Area The Data Presentation Area Is where data is organized, stored and made available for queries, report writers, and other analytical processing This area is the Warehouse as far as the business community is concerned

Data Access Tools Analysis / OLAP / DSS Tools Querying / Reporting Tools Data Mining

Warehouse components

Component: Operational Data The sources of data for the data warehouse is supplied from: The data from the mainframe systems in the traditional network and hierarchical format Data can also come from the relational DBMS like Oracle, Informix In addition to these internal data, operational data also includes external data obtained from commercial databases and databases associated with supplier and customers

Component: Load Manager The load manager (also called the front end component) performs all the operations associated with extraction and loading data into the data warehouse These operations include simple transformations of the data to prepare the data for entry into the warehouse The size and complexity of this component will vary between data warehouses and may be constructed using a combination of vendor data loading tools and custom built programs

Component: Warehouse Manager The warehouse manager performs all the operations associated with the management of data in the warehouse This component is built using vendor data management tools and custom built programs The operations performed by warehouse manager include: Analysis of data to ensure consistency Transformation and merging the source data from temporary storage into data warehouse tables Create indexes and views on the base table. Generation of de-normalization Generation of aggregation Backing up and archiving of data

Warehouse Manager: Detailed Data This area of the warehouse stores all the detailed data in the database schema In most cases detailed data is not stored online but aggregated to the next level of details However the detailed data is added regularly to the warehouse to supplement the aggregated data

Warehouse Manager: Lightly and Highly summarized data The area of the data warehouse stores all the predefined lightly and highly summarized (aggregated) data generated by the warehouse manager This area of the warehouse is transient as it will be subject to change on an ongoing basis in order to respond to the changing query profiles The purpose of the summarized information is to speed up the query performance The summarized data is updated continuously as new data is loaded into the warehouse

Warehouse Manager: Archive and Back-up Data This area of the warehouse stores detailed and summarized data for the purpose of archiving and back-up The data is transferred to storage archives such as magnetic tapes or optical disks

Warehouse Manager: Meta Data The data warehouse also stores all the Meta data (data about data) definitions used by all processes in the warehouse It is used for variety of purposed including: The extraction and loading process Meta data is used to map data sources to a common view of information within the warehouse. The warehouse management process Meta data is used to automate the production of summary tables. As part of Query Management process Meta data is used to direct a query to the most appropriate data source. The structure of Meta data will differ in each process, because the purpose is different

Component: Query Manager The query manager (also called the back end component) performs all operations associated with management of user queries This component is usually constructed using vendor end-user access tools, data warehousing monitoring tools, database facilities and custom built programs The complexity of a query manager is determined by facilities provided by the end-user access tools and database

Component: End-user Access Tools The principal purpose of data warehouse is to provide information to the business managers for strategic decision-making These users interact with the warehouse using end user access tools The examples of some of the end user access tools can be: Reporting and Query Tools Application Development Tools Executive Information Systems Tools Online Analytical Processing Tools Data Mining Tools