Data Warehouse and Data Mining

Similar documents
0. Organizational Issues. 0. Why should you be here? 0. Why should you be here? 0. Recommended Literature. 0. Why should you be here?

0. Organizational Issues. 0. Why should you be here? 0. Why should you be here? 0. Why should you be here? 0. Recommended Literature

Data Warehousing & Mining Techniques

Data Mining Concepts & Techniques

Data Warehouse and Data Mining

Data Warehouse and Data Mining

Data Warehouse and Data Mining

ETL and OLAP Systems

Data Warehouse and Data Mining

Data Warehousing & Data Mining

Data Warehousing & Mining Techniques

Data Warehouse and Data Mining

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

Data Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

DATA WAREHOUING UNIT I

Tribhuvan University Institute of Science and Technology MODEL QUESTION

Data Warehousing & Mining Techniques

2. Summary. 2.1 Basic Architecture. 2. Architecture. 2.1 Staging Area. 2.1 Operational Data Store. Last week: Architecture and Data model

Data warehouse and Data Mining

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Summary. Summary. 5. Optimization. 5.1 Partitioning. 5.1 Partitioning. 26-Nov-10. B-Trees are not fit for multidimensional data R-Trees

Question Bank. 4) It is the source of information later delivered to data marts.

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week:

Data Mining. ❸Chapter 3 Data warehouse, ETL and OLAP. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A.

Dr.G.R.Damodaran College of Science

DATA MINING AND WAREHOUSING

DATA WAREHOUSE- MODEL QUESTIONS

E(xtract) T(ransform) L(oad)

Data Warehousing & Data Mining

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Data warehouses Decision support The multidimensional model OLAP queries

Information Management course

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

After completing this course, participants will be able to:

Decision Support Systems aka Analytical Systems

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

What is a Data Warehouse?

Basics of Dimensional Modeling

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News!

Data Warehousing. Adopted from Dr. Sanjay Gunasekaran

Proceedings of the IE 2014 International Conference AGILE DATA MODELS

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20

Information Management course

OBIEE Course Details

QUALITY MONITORING AND

The strategic advantage of OLAP and multidimensional analysis

Advanced Data Management Technologies

Evolving To The Big Data Warehouse

Data Warehouse and Data Mining

Business Intelligence An Overview. Zahra Mansoori

TIM 50 - Business Information Systems

Improving the Performance of OLAP Queries Using Families of Statistics Trees

TIM 50 - Business Information Systems

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Table of Contents. Knowledge Management Data Warehouses and Data Mining. Introduction and Motivation

Knowledge Management Data Warehouses and Data Mining

Information Integration

DSS based on Data Warehouse

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?

DATABASE DEVELOPMENT (H4)

Decision Support. Chapter 25. CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1

SCHEME OF COURSE WORK. Data Warehousing and Data mining

Business Intelligence Roadmap HDT923 Three Days

Sql Fact Constellation Schema In Data Warehouse With Example

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Data Warehousing and OLAP

Data Warehouse and Data Mining

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu

Physical Modeling of Data Warehouses using UML

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Top Trends in DBMS & DW

02 Hr/week. Theory Marks. Internal assessment. Avg. of 2 Tests

Course Book Academic Year

Data Warehousing. Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Data Warehouse and Mining

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse

CS 245: Database System Principles. Warehousing. Outline. What is a Warehouse? What is a Warehouse? Notes 13: Data Warehousing

Data Mining and Analytics. Introduction

Introduction to DWML. Christian Thomsen, Aalborg University. Slides adapted from Torben Bach Pedersen and Man Lung Yiu

Netezza The Analytics Appliance

Chapter 18: Data Analysis and Mining

Introduction to Data Mining and Data Analytics

Data Mining & Data Warehouse

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

~ Ian Hunneybell: DWDM Revision Notes (31/05/2006) ~

The University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory

Transcription:

Data Warehouse and Data Mining Lecture No. 02 Lifecycle of Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

Outline Lifecycle of DW Classical SDLC vs. DW SDLC Operating DW Acknowledgements: Wolf-Tilo Balke and Silviu Homoceanu http://www.ifis.cs.tu-bs.de

Lifecycle of DW DW System Development Life Cycle (SDLC) Design End-user interview cycles Source system cataloging Definition of key performance indicators Mapping of decision-making processes underlying information needs Logical and physical schema design

Lifecycle of DW Prototype Objective is to constrain and in some cases reframe end-user requirements Deployment Development of documentation Training Operations and management processes Operation Day-to-day maintenance of the DW needs a good management of ongoing Extraction, Transformation and Loading (ETL) process

Lifecycle of DW Enhancement needs the modification of HW - physical components Operations and management processes Logical schema designs

Lifecycle of DW Classical SDLC vs. DW SDLC DW SDLC is almost the opposite of classical SDLC

Lifecycle of DW Classical SDLC vs. DW SDLC Because it is the opposite of SDLC, DW SDLC is also called CLDS

Lifecycle of DW CLDS is a data driven development life cycle It starts with data Once data is at hand it is integrated and tested against bias Programs are written against the data and the results are analyzed and finally the requirements of the system are understood Once requirements are understood, adjustments are made to the design and the cycle starts all over spiral development methodology

Operating a DW In Operating a DW the following phases can be identified Monitoring Extraction Transforming Loading Analyzing

Operating a DW: Monitoring Monitoring Surveillance of the data sources Identification of data modification which is relevant to the DW Monitoring has an important role over the whole process deciding on which data the next steps will be applied on Monitoring techniques Active mechanisms - Event Condition Action (ECA) rules:

Operating a DW: Monitoring Monitoring techniques Replication mechanisms Snapshot: Local copy of data, similar to a View Used by Oracle 9i Data replication Replicates and maintains data in destination tables through data propagation processes Used by IBM

Operating a DW: Monitoring Monitoring techniques Protocol based mechanisms Since DBMS write protocol data for transaction management, the protocol can be used also for monitoring Difficult due to the fact that the protocol format is proprietary and subject to change Application managed mechanisms Hard to implement for legacy systems Based on time stamping or data comparison

Operating a DW: Extraction Extraction Reads the data which was selected throughout the monitoring phase and inserts it in the data structures of the workplace Due to large data volume, compression can be used The time-point for performing extraction can be: Periodical: Weather or stock market information can be actualized more times in a day, while product specification can be actualized in a longer period of time On request: For example when a new item is added to a product group

Operating a DW: Extraction Extraction The time-point for performing extraction can be: Event driven: Event driven extraction can be helpful in scenarios where time, or the number of modifications over passing a specified threshold triggers the extraction. For example each night at 03:00 or each time 50 new modifications took place, an extraction is performed Immediate: In some special cases like the stock market it can be necessary that the changes propagate immediately to the warehouse The extraction largely depends on hardware and the software used for the DW and the data source

Operating a DW: Transforming Transforming Implies adapting data, schema as well as data quality to the application requirements Data integration: Transformation in de-normalized data structures Handling of key attributes Adaptation of different types of the same data Conversion of encoding: Buy, Sell 1,2 vs. B,S 1,2 Normalization: Michael Schumacher Michael, Schumacher vs. Schumacher Michael Michael, Schumacher

Operating a DW: Transforming Transforming Data integration: Date handling: MM-DD-YYYY MM.DD.YYYY Measurement units and scaling: 10 inch 25,4 cm 30 mph 48,279 km/h Save calculated values Price_incl_VAT = Price_excl_VAT * 1.19 Aggregation Daily sums can be added into weekly ones Different levels of granularity can be used

Transforming Data cleaning: Operating a DW: Transforming Consistency check Delivery_date < Order_date Completeness Management of missing values as well as NULL values

Operating a DW: Loading Loading Loading usually takes place during weekends or nights when the system is not under user stress Split between initial load to initialize the DW and the periodical load to keep the DW updated Initial loading Implies big volumes of data and for this reason a bulk loader is used Usually performed by partitioning, parallelization and incremental actualization

Analyze Data access Operating a DW: Analyzing Useful for extracting goal oriented information: How many iphones 3G were sold in the Braunschweig stores of T- Mobile in the last 3 calendar weeks of 2008? Although it is a common OLTP query, it might be to complex for the operational environment to handle OLAP Falsely used as representing DW because it is used to analyze data contained in DW Used to answer requests like: In which district does a product group register the highest profit How did the profit change in comparison to the previous month?

Analyze OLAP Operating a DW: Analyzing Used to answer requests like: Mostly known as organized on a multidimensional data model Common operations for analyze are: Data mining» Pivoting/Rotation» Roll-up, Drill-down and Drill-across» Slice and Dice Useful for identifying hidden patterns Refers to two separate processes: KDD (Knowledge Discovery in Databases) Prediction

Analyze Data mining Operating a DW: Analyzing Useful for answering questions like: How did the sales of this product group evolve? Methods and procedures for data mining Clustering, Classification, Regression, Association rule learning