Data Warehouse and Data Mining

Similar documents
Data Warehouse and Data Mining

Data Warehouse and Data Mining

Data Warehouse and Data Mining

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Data Mining Concepts & Techniques

The Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing

An Overview of Data Warehousing and OLAP Technology

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

ETL and OLAP Systems

Data Warehouse. Asst.Prof.Dr. Pattarachai Lalitrojwong

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

Teradata Aggregate Designer

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

Data Warehouse and Data Mining

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Data Warehousing Conclusion. Esteban Zimányi Slides by Toon Calders

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?

DATA WAREHOUING UNIT I

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

collection of data that is used primarily in organizational decision making.

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

DATA MINING AND WAREHOUSING

Oracle 1Z0-515 Exam Questions & Answers

Sql Fact Constellation Schema In Data Warehouse With Example

Data Warehouse and Data Mining

REPORTING AND QUERY TOOLS AND APPLICATIONS

Chapter 18: Data Analysis and Mining

TIM 50 - Business Information Systems

Information Integration

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

CHAPTER 3 Implementation of Data warehouse in Data Mining

Data Warehousing & OLAP

DATA WAREHOUSE- MODEL QUESTIONS

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders

Dr.G.R.Damodaran College of Science

TIM 50 - Business Information Systems

Call: SAS BI Course Content:35-40hours

Rocky Mountain Technology Ventures

Evolution of Database Systems

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

CS 1655 / Spring 2013! Secure Data Management and Web Applications

Unit 7: Basics in MS Power BI for Excel 2013 M7-5: OLAP

Data Warehouse and Data Mining

Adnan YAZICI Computer Engineering Department

Decision Support Systems

Syllabus. Syllabus. Motivation Decision Support. Syllabus

A Multi-Dimensional Data Model

Decision Support Systems aka Analytical Systems

Hyperion Interactive Reporting Reports & Dashboards Essentials

CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News!

Data Warehousing and Decision Support

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

Improving the Performance of OLAP Queries Using Families of Statistics Trees

Data Warehouses and OLAP. Database and Information Systems. Data Warehouses and OLAP. Data Warehouses and OLAP

Data Warehouse and Data Mining

Data Warehouse and Mining

Data Warehousing and OLAP

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa

Decision Support. Chapter 25. CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1

Data Warehousing and Decision Support

OLAP Introduction and Overview

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database.

Data warehouse architecture consists of the following interconnected layers:

D Daaatta W Waaarrreeehhhooouuusssiiinng B I R L A S O F T

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 14 - Data Warehousing and Column Stores

CS 245: Database System Principles. Warehousing. Outline. What is a Warehouse? What is a Warehouse? Notes 13: Data Warehousing

Data Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

On-Line Analytical Processing (OLAP) Traditional OLTP

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

MOLAP Data Warehouse of a Software Products Servicing Call Center

Basics of Dimensional Modeling

Oracle Database 11g: Data Warehousing Fundamentals

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1396

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Create Cube From Star Schema Grouping Framework Manager

02 Hr/week. Theory Marks. Internal assessment. Avg. of 2 Tests

Appliances and DW Architecture. John O Brien President and Executive Architect Zukeran Technologies 1

QUALITY MONITORING AND

Data Warehousing and OLAP Technologies for Decision-Making Process

Processing of Very Large Data

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

After completing this course, participants will be able to:

Big Data 13. Data Warehousing

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus

DATA MINING TRANSACTION

Step-by-step data transformation

On-Line Application Processing

Proceedings of the IE 2014 International Conference AGILE DATA MODELS

Chapter 4, Data Warehouse and OLAP Operations

DSS based on Data Warehouse

SQL Server Analysis Services

What is a Data Warehouse?

Enterprise Informatization LECTURE

International Journal of Scientific & Engineering Research, Volume 7, Issue 11, November ISSN

Transcription:

Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

Data Warehouse Operational Data Data Warehouse Access Tools End Users

Data Warehouse A data warehouse is a central, enterprise-wide database which contains information extracted from the operational data stores. Operational Systems: A system which is used to process the day-to-day transactions of an organization.

Data Warehouse Architecture

Data Warehouse Architecture

Operational Source systems These are the operational systems of record that capture the transactions of the business. These systems are outside the data warehouse which do not have control over contents and format of the data The source systems maintain little historical data These systems generate operation data that is detailed, current and subject to change

Data Staging Area Data staging area can be divided into three phases Extraction (E) Transformation (T) Loading (L) Extraction: It means reading and understanding the source data and copying the data needed for the data warehouse into staging area for further manipulation (i.e. transformation)

Data Staging Area Loading: Loading refers to populating of data warehouse with data that has been extracted from operational systems. There are two types of loads, which generally take place in data warehouse environment: Initial load Incremental load

Data Staging Area Transformation: The transformation phase applies a series of rules or functions to the extracted/ loaded data. This may include some or all of the following: Select only certain columns to load (or if you prefer, null columns not to load) Translate coded values Derive a new calculated value (e.g. sale_amount = qty * unit_price) Denormalization in order to fit the Dawarehouse Schema Summarize multiple rows of data (e.g. total sales for each region)

ETL versus ELT ETL (The traditional approach): ETL (Extract, transform, and load) is a process in data warehousing that involves: Extracting data from outside sources transforming it to fit business needs, and ultimately loading it into the data warehouse ELT (The Teradata Approach): ELT (Extract, Load and Transform) strategy extracts and loads the data into a Teradata Database first, then uses the power and performance of the Teradata Warehouse to perform the transformation

Data Presentation Area Extended Relational DBMS (ROLAP servers) data stored in RDB star-join schemas support SQL extensions (Cube) Index structures (bitmap, join) Multidimensional DBMS (MOLAP servers) data stored in arrays (n-dimensional array) direct access to array data structure poor storage utilization, especially when the data is sparse

Data Access Tools Analysis / OLAP / DSS Tools Querying / Reporting Tools Data Mining

Data warehouse bus architecture

Warehouse components

Component: Operational Data The sources of data for the data warehouse is supplied from: The data from the mainframe systems in the traditional network and hierarchical format Data can also come from the relational DBMS like Oracle, Informix In addition to these internal data, operational data also includes external data obtained from commercial databases and databases associated with supplier and customers

Component: Load Manager The load manager (also called the front end component) performs all the operations associated with extraction and loading data into the data warehouse These operations include simple transformations of the data to prepare the data for entry into the warehouse The size and complexity of this component will vary between data warehouses and may be constructed using a combination of vendor data loading tools and custom built programs

Component: Warehouse Manager The warehouse manager performs all the operations associated with the management of data in the warehouse This component is built using vendor data management tools and custom built programs The operations performed by warehouse manager include: Analysis of data to ensure consistency Transformation and merging the source data from temporary storage into data warehouse tables Create indexes and views on the base table. Generation of de-normalization Generation of aggregation Backing up and archiving of data

Warehouse Manager: Detailed Data This area of the warehouse stores all the detailed data in the database schema In most cases detailed data is not stored online but aggregated to the next level of details However the detailed data is added regularly to the warehouse to supplement the aggregated data

Warehouse Manager: Lightly and Highly summarized data The area of the data warehouse stores all the predefined lightly and highly summarized (aggregated) data generated by the warehouse manager This area of the warehouse is transient as it will be subject to change on an ongoing basis in order to respond to the changing query profiles The purpose of the summarized information is to speed up the query performance The summarized data is updated continuously as new data is loaded into the warehouse

Warehouse Manager: Archive and Back-up Data This area of the warehouse stores detailed and summarized data for the purpose of archiving and back-up The data is transferred to storage archives such as magnetic tapes or optical disks

Warehouse Manager: Meta Data The data warehouse also stores all the Meta data (data about data) definitions used by all processes in the warehouse It is used for variety of purposed including: The extraction and loading process Meta data is used to map data sources to a common view of information within the warehouse. The warehouse management process Meta data is used to automate the production of summary tables. As part of Query Management process Meta data is used to direct a query to the most appropriate data source. The structure of Meta data will differ in each process, because the purpose is different

Component: Query Manager The query manager (also called the back end component) performs all operations associated with management of user queries This component is usually constructed using vendor end-user access tools, data warehousing monitoring tools, database facilities and custom built programs The complexity of a query manager is determined by facilities provided by the end-user access tools and database

Component: End-user Access Tools The principal purpose of data warehouse is to provide information to the business managers for strategic decision-making These users interact with the warehouse using end user access tools The examples of some of the end user access tools can be: Reporting and Query Tools Application Development Tools Executive Information Systems Tools Online Analytical Processing Tools Data Mining Tools

Warehouse Models and Data Models Relations stars & snowflakes Cubes Operators Slice and dice roll-up, drill down pivoting other Operators