Information Management course

Similar documents
Information Management course

What is a Data Warehouse?

Dta Mining and Data Warehousing

Chapter 4, Data Warehouse and OLAP Operations

Jarek Szlichta Acknowledgments: Jiawei Han, Micheline Kamber and Jian Pei, Data Mining - Concepts and Techniques

CS 412 Intro. to Data Mining

Data Warehousing & On-Line Analytical Processing

Data Warehousing & OLAP

Data Warehousing & On-line Analytical Processing

Data Warehouse. Concepts and Techniques. Chapter 3. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1

CS490D: Introduction to Data Mining Chris Clifton

Analyse des Données. Master 2 IMAFA. Andrea G. B. Tettamanzi

A Multi-Dimensional Data Model

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394

ECT7110 Introduction to Data Warehousing

Data Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

ECLT 5810 Introduction to Data Warehousing

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1396

By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad

Data mining: Hmm, what is it?

Introduction to Data Warehousing

DATA WAREHOUSING & DATA MINING. by: Prof. Asha Ambhaikar

Reminds on Data Warehousing

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

MSCIT 5210/MSCBD 5002: Knowledge Discovery and Data Mining

An Overview of Data Warehousing and OLAP Technology

Information Management course

Reporting and Query tools and Applications

Decision Support Systems

An Overview of Data Warehousing and OLAP Technology

MSCIT 5210/MSCBD 5002: Knowledge Discovery and Data Mining

Business Intelligence

DATA WAREHOUSE AND DATA MINING

CS490D: Introduction to Data Mining Prof. Chris Clifton

Basics of Dimensional Modeling

Data Quality. Data Cleaning and Integration. Data Cleaning. Data Preprocessing. Handling Missing Values. Disguised Missing Data?

Cognos also provides you an option to export the report in XML or PDF format or you can view the reports in XML format.

ETL and OLAP Systems

UNIT 4. DATA WAREHOUSING

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

Decision Support Systems aka Analytical Systems

DATA WAREHOUING UNIT I

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

CS490D: Introduction to Data Mining Prof. Chris Clifton. Project Presentations

collection of data that is used primarily in organizational decision making.

Data Mining Concepts & Techniques

Data Warehouse and Data Mining

Table of Contents. Rajesh Pandey Page 1

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

Sql Fact Constellation Schema In Data Warehouse With Example

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Acknowledgment. Sales data example. Outline. Excel pivot table. Example: Sales

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Data Warehousing. Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Data Warehousing & OLAP

Lecture 2 and 3 - Dimensional Modelling

Unit 7: Basics in MS Power BI for Excel 2013 M7-5: OLAP

Adnan YAZICI Computer Engineering Department

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

Improving the Performance of OLAP Queries Using Families of Statistics Trees

On-Line Analytical Processing (OLAP) Traditional OLTP

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

Chapter 5, Data Cube Computation

Data Cube Technology

QUALITY MONITORING AND

The strategic advantage of OLAP and multidimensional analysis

Data Mining: Concepts and Techniques. Example of Star Schema. Multidimensional Data. A Short Recap on Data Cubes Data Cube: A Lattice of Cuboids

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

Tribhuvan University Institute of Science and Technology MODEL QUESTION

Data Cube Technology. Chapter 5: Data Cube Technology. Data Cube: A Lattice of Cuboids. Data Cube: A Lattice of Cuboids

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

The University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory

Data Warehousing Conclusion. Esteban Zimányi Slides by Toon Calders

Data Warehouse Logical Design. Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato)

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3

Data Mining: Concepts and Techniques. Chapter 4

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

Data Analysis and Data Science

Data Warehousing and Decision Support

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

Decision Support, Data Warehousing, and OLAP

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

Assignment 1. Question 1: Brock Wilcox CS

Data Warehouse and Data Mining

Question Bank. 4) It is the source of information later delivered to data marts.

DATA MINING AND WAREHOUSING

Data Mining: Exploring Data. Lecture Notes for Chapter 3

REPORTING AND QUERY TOOLS AND APPLICATIONS

CT75 (ALCCS) DATA WAREHOUSING AND DATA MINING JUN

Data Warehousing and OLAP

Data Warehousing and Decision Support

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Transcription:

Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 07 : 06/11/2012

Data Mining: Concepts and Techniques (3 rd ed.) Chapter 4 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights reserved. 2

Chapter 4: Data Warehousing and On-line Analytical Processing Data Warehouse: Basic Concepts Data Warehouse Modeling: Data Cube and OLAP Data Warehouse Design and Usage Data Warehouse Implementation Data Generalization by Attribute-Oriented Induction Summary 3

From Tables and Spreadsheets to Data Cubes A data warehouse is based on a multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year) Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables In data warehousing literature, an n-d base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube. 4

Data Cube: A Lattice of Cuboids all time item location supplier 0-D (apex) cuboid 1-D cuboids time,item time,location item,location location,supplier time,supplier item,supplier 2-D cuboids time,location,supplier time,item,location time,item,supplier item,location,supplier 3-D cuboids 4-D (base) cuboid time, item, location, supplier 5

Conceptual Modeling of Data Warehouses Modeling data warehouses: dimensions & measures Star schema: A fact table in the middle connected to a set of dimension tables Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation 7

Example of Star Schema time time_key day day_of_the_week month quarter year branch branch_key branch_name branch_type Measures Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales item item_key item_name brand type supplier_type location location_key street city state_or_province country 8

Example of Snowflake Schema time time_key day day_of_the_week month quarter year Sales Fact Table time_key item_key item item_key item_name brand type supplier_key supplier supplier_key supplier_type branch branch_key branch_name branch_type Measures branch_key location_key units_sold dollars_sold avg_sales location location_key street city_key city city_key city state_or_province country 9

Example of Fact Constellation time time_key day day_of_the_week month quarter year Sales Fact Table time_key item_key branch_key item item_key item_name brand type supplier_type Shipping Fact Table time_key item_key shipper_key from_location branch branch_key branch_name branch_type Measures location_key units_sold dollars_sold avg_sales location location_key street city province_or_state country to_location dollars_cost units_shipped shipper shipper_key shipper_name location_key shipper_type 10

A Concept Hierarchy: Dimension (location) all all region Europe... North_America country Germany... Spain Canada... Mexico city Frankfurt... Vancouver... Toronto office L. Chan... M. Wind 11

Data cube measures Measure: a numeric function that can be evaluated at each point in the data cube space: Fact Aggregation of facts

Data Cube Measures: Three Categories Distributive: if the result derived by applying the function to n aggregate values is the same as that derived by applying the function on all the data without partitioning E.g., count(), sum(), min(), max() Algebraic: if it can be computed by an algebraic function with M arguments (where M is a bounded integer), each of which is obtained by applying a distributive aggregate function E.g., avg() = sum() / count(), min_n()... Holistic: if there is no constant bound on the storage size needed to describe a subaggregate. E.g., median(), mode(), rank() 13

Multidimensional Data Sales volume as a function of product, month, and region Region Dimensions: Product, Location, Time Hierarchical summarization paths Industry Region Year Category Country Quarter Product Product City Month Week Office Day Month 15

A Sample Data Cube TV PC VCR sum Product Date 1Qtr 2Qtr 3Qtr 4Qtr sum Total annual sales of TVs in U.S.A. U.S.A Canada Mexico Country sum 16

Cuboids Corresponding to the Cube all product date country product,date product,country date, country 0-D (apex) cuboid 1-D cuboids 2-D cuboids product, date, country 3-D (base) cuboid 17

Typical OLAP Operations Roll up (drill-up): summarize data by climbing up hierarchy or by dimension reduction Drill down (roll down): reverse of roll-up from higher level summary to lower level summary or detailed data, or introducing new dimensions Slice and dice: project and select Pivot (rotate): reorient the cube, visualization, 3D to series of 2D planes Other operations drill across: involving (across) more than one fact table 18

19

20

21

22

23

A Star-Net Query Model Shipping Method AIR-EXPRESS Customer Orders CONTRACTS Customer Time TRUCK ANNUALY QTRLY DAILY CITY COUNTRY ORDER PRODUCT LINE Product PRODUCT ITEM PRODUCT GROUP SALES PERSON DISTRICT REGION Location Each circle is called a footprint Promotion DIVISION Organization 24

Browsing a Data Cube Visualization OLAP capabilities Interactive manipulation 25

Chapter 4: Data Warehousing and On-line Analytical Processing Data Warehouse: Basic Concepts Data Warehouse Modeling: Data Cube and OLAP Data Warehouse Design and Usage Data Warehouse Implementation Data Generalization by Attribute-Oriented Induction Summary 26

Design of Data Warehouse: A Business Analysis Framework Four views regarding the design of a data warehouse Top-down view allows selection of the relevant information necessary for the data warehouse Data source view exposes the information being captured, stored, and managed by operational systems Data warehouse view consists of fact tables and dimension tables Business query view sees the perspectives of data in the warehouse from the view of end-user 27

Data Warehouse Design Process Top-down, bottom-up approaches or a combination of both Top-down: Starts with overall design and planning (mature) Bottom-up: Starts with experiments and prototypes (rapid) From software engineering point of view (1) planning (2) requirements study (3) problem analysis (4) warehouse design (5) data integration and testing (6) deployment Waterfall: structured and systematic analysis at each step before proceeding to the next (better for data warehouse) Spiral: rapid generation of increasingly functional systems, short turn around time, quick turn around (better for data marts) 28

Data Warehouse Design Process Typical data warehouse design process Choose a business process to model, e.g., orders, invoices, etc. Choose the grain (atomic level of data) of the business process Choose the dimensions that will apply to each fact table record Choose the measure that will populate each fact table record 29

Data Warehouse Development: A Recommended Approach Distributed Data Marts Multi-Tier Data Warehouse Data Mart Data Mart Enterprise Data Warehouse Model refinement Model refinement Define a high-level corporate data model 30

Data Warehouse Usage Three kinds of data warehouse applications Information processing supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs Analytical processing multidimensional analysis of data warehouse data supports basic OLAP operations, slice-dice, drilling, pivoting Data mining knowledge discovery from hidden patterns supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools 31

From On-Line Analytical Processing (OLAP) to On Line Analytical Mining (OLAM) Why online analytical mining? High quality of data in data warehouses DW contains integrated, consistent, cleaned data Available information processing structure surrounding data warehouses ODBC, OLEDB, Web accessing, service facilities, reporting and OLAP tools OLAP-based exploratory data analysis Mining with drilling, dicing, pivoting, etc. On-line selection of data mining functions Integration and swapping of multiple mining functions, algorithms, and tasks 32