Data Warehousing & Mining Techniques

Similar documents
Data Warehousing & Mining Techniques

2. Summary. 2.1 Basic Architecture. 2. Architecture. 2.1 Staging Area. 2.1 Operational Data Store. Last week: Architecture and Data model

Data Warehouse and Data Mining

Data Mining Concepts & Techniques

Data Warehousing & Mining Techniques

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model

0. Organizational Issues. 0. Why should you be here? 0. Why should you be here? 0. Recommended Literature. 0. Why should you be here?

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Data Warehousing & Data Mining

Data Warehousing. Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Question Bank. 4) It is the source of information later delivered to data marts.

Data Mining & Data Warehouse

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

Data Warehouse and Data Mining

0. Organizational Issues. 0. Why should you be here? 0. Why should you be here? 0. Why should you be here? 0. Recommended Literature

Data Warehousing & OLAP

Data Warehouse and Data Mining

Data warehouse architecture consists of the following interconnected layers:

Physical Modeling of Data Warehouses using UML

Decision Support, Data Warehousing, and OLAP

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

Information Management course

Data Warehouse and Data Mining

Fundamentals of Information Systems, Seventh Edition

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week:

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting.

Relational Database Systems 2 1. System Architecture

CHAPTER 3 Implementation of Data warehouse in Data Mining

Data Warehouse and Mining

Data Warehousing and OLAP Technologies for Decision-Making Process

Dta Mining and Data Warehousing

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

Data Warehousing & Data Mining

Relational Database Systems 1

Decision Support Systems aka Analytical Systems

Full file at

Relational Database Systems 1

Data Warehousing Introduction. Toon Calders

ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS. CS121: Relational Databases Fall 2017 Lecture 22

Data Warehouse and Data Mining

Evolution of Database Systems

DATA MINING TRANSACTION

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394

Basics of Dimensional Modeling

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database.

Summary. Summary. 5. Optimization. 5.1 Partitioning. 5.1 Partitioning. 26-Nov-10. B-Trees are not fit for multidimensional data R-Trees

Data Mining. Associate Professor Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology

Seminars of Software and Services for the Information Society. Data Warehousing Design Issues

DATA MINING AND WAREHOUSING

An Overview of Data Warehousing and OLAP Technology

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

Data Warehousing and OLAP

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Relational Database Systems 1

TIM 50 - Business Information Systems

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

OLAP Introduction and Overview

Lecture2: Database Environment

On-Line Application Processing

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

Introduction to DWML. Christian Thomsen, Aalborg University. Slides adapted from Torben Bach Pedersen and Man Lung Yiu

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa

Data Warehousing. Overview

Relational Database Systems 2 1. System Architecture

Meaning & Concepts of Databases

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

What is a Data Warehouse?

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

IT1105 Information Systems and Technology. BIT 1 ST YEAR SEMESTER 1 University of Colombo School of Computing. Student Manual

Warehousing. Data Mining

Data Warehouse. Asst.Prof.Dr. Pattarachai Lalitrojwong

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Cognos also provides you an option to export the report in XML or PDF format or you can view the reports in XML format.

Data Warehousing (1)

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders

collection of data that is used primarily in organizational decision making.

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

The Data Organization

COWLEY COLLEGE & Area Vocational Technical School

Sql Fact Constellation Schema In Data Warehouse With Example

Chapter 6 VIDEO CASES

DATABASE DEVELOPMENT (H4)

Application software office packets, databases and data warehouses.

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20

The Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing

REPORTING AND QUERY TOOLS AND APPLICATIONS

Managing Information Resources

A Multi-Dimensional Data Model

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

The COSMIC Functional Size Measurement Method Version 4.0.1

TIM 50 - Business Information Systems

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Managing Data Resources

Transcription:

2. Summary Data Warehousing & Mining Techniques Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Last week: What is a Data Warehouse Applications and users Lifecycle and phases Architecture and Data model This lecture Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 2 2. Architecture 2. Architecture 2.1 Basic Architecture 2.2 Architectures in Practice 2.3 DW Storage Structures 2.4 DW Data Modeling 2.1 Basic Architecture Architecture of a DW in theory Data Sources Operational System Staging Area Warehouse Purchasing Users Operational System Metadata Summary Raw Data Data Reporting Flat files Inventory Mining Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 3 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 4 2.1 Operational Data Store Databases that serve daily operations of the enterprise e.g. production, sales (cash register), accounting Usually rely on relational database technology (see RDB1) Optimized for small queries like: simple product lookups, inserts, updates and deletes 2.1 Staging Area Contains a separate copy of the data which will be loaded from ODS to the DW In the staging area the copied data is prepared (integrated, cleaned, etc.) Customers aren t invited to visit the kitchen Similar to a restaurant s kitchen, the data staging area should be accessible only to skilled DW professionals, neither ODS admins. nor analysts Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 5 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 6 1

2.1 Data Warehouse 2.1 Presentation Area The DW persistently stores Cleaned raw data Derived (aggregated) data Usual aggregates of the raw data e.g. quarter sales per regions Performance reasons: avoid computing (the same) aggregates times and again at query time Metadata Describe the meaning, properties and origins of the data in the DW (e.g. provenance & lineage) The presentation area comprises where data is organized according to the focus of one department Similar to DB views, but usually stored (materialized view) Reporting as well as analytical processing tools This area is the Warehouse as far as the business community is concerned Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 7 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 8 2.1 Building a complete DW Hardware and data flow architecture Complete data flow from ODS up to the presentation Most important step is the Extract Transform Load (ETL) process Storage structure The used model for storing data in the DW Data modeling Conceptual, logical and physical models for the DW storage structure 2.2 Architectures in Practice Popular DW architectures in practice Vertical tiers Generic Two-Tier Architecture Three-Tier Architecture Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 9 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 10 2.2 Two-Tier Architecture Generic client-server architecture Fat or thin client depending on where the data analysis is performed Data Sources Operational System Staging Area Server Warehouse Purchasing Client Users 2.2 Thin Client Operations are executed on the server The client is just used to display the results This architecture fits well for Internet DW access Client HTTP, IIOP Server Operational System Flat files Metadata Summary Raw Data Data Inventory Reporting Mining Data storage Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 11 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 12 2

2.2 Fat Client The server just delivers the data e.g. the corresponding data mart Operations are executed on the client Communication between client and server must be able to sustain large ODBC, JDBC, NFS data transfers Server Client 2.2 Three-Tier Architecture Tier 1: raw and detailed data intended to be the single source for all decision support Tier 2: derived data that had been aggregated for DSS support Tier 3: reporting and analysis Data storage Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 13 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 14 2.2 Other Architectures N-Tier Architecture Higher tier architecture is also possible but the complexity grows with the number of tier-interfaces Web-based Architectures Advantage: Usage of existing software, reduction of costs, platform independence Disadvantage: Security overhead e.g. data encryption, user access and identification 2.2 Architectures in Practice Popular DW architectures in Practice Horizontal tiers Independent Data Mart Dependent Data Mart Logical Data Mart Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 15 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 16 2.2 Independent Data Mart Mini warehouses limited in scope Separate ETL for each independent Data Mart High access complexity 2.2 Dependent Data Mart Single ETL for the DW are loaded from the DW More simple data access than in the previous case Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 17 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 18 3

2.2 Logical Data Mart The ETL is near real-time are not separate databases, but logical views of the DW DW Application independent Centralized, Planned DW Historical, detailed, summarized Lightly denormalized 2.2 DW vs. Scope Specific DSS application Decentralized by user area Organic, possibly not planned Data Some history, detailed, summarized Highly denormalized DW Multiple subjects DW Many internal and external sources DW Flexible Data-oriented Long life Large Single complex structure Subjects Sources One central subject Few internal and external sources Other characteristics Restrictive Project oriented Short life Start small, becomes large Multiple, semi-complex structure, together complex Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 19 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 20 2.2 Centralized vs. Distributed DW may be centralized or distributed Centralized DW (e.g. Volkswagen) Analytical queries are run only at the main enterprise location - no need to transport data via network High costs for large dedicated hardware Distributed DW (e.g. WalMart) More natural form due to corporations being active all over the world and having different types of hardware and software Higher overhead but lower cost Types of distributed DW Geographically distributed Local DW/global DW Technologically distributed DW Logically one DW, physically more DW Independently evolving distributed DW Uncontrolled growth Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 21 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 22 Geographically distributed In the case of corporations spread around the world Information is needed both locally and globally A distributed DW makes sense When much processing occurs at the local level Even though local branches report to the same balance sheet, the local organizations are somewhat autonomous Typical example is franchising e.g. McDonald s China USA (HQ) Aggregated Data DW Asia DW USA Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 23 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 24 4

Technologically distributed DW Placing the DW on the distributed technology of some vendor Advantages Entry costs are cheap large centralized hardware is expensive No theoretical limit on how much data can be placed in the DW new servers can be added to the network on demand As the DW starts to expand network communication starts playing an important role Example: Let s simplify and consider we have 4 nodes each holding data regarding a specific year Now let s consider a query which needs to access data from the last 4 years Large amount of data has to be shipped to processing units 2008 2007 2006 2005 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 25 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 26 Independently evolving distributed DW In practice there are many cases in which independent DW are developed concurrently in the same organization The first step in many corporations is to build a DW for financial or marketing Once this is successfully set up, other parts of the organization follow independently 2.3 DW Data Storage Different architectures Vertical, horizontal, centralized, distributed, etc. are all variations of the basic architecture How is the data storage performed for this data flow architecture? Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 27 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 28 2.3 DW Data Storage DW users look at the data from different perspectives (dimensions) Consequently data presentation is multidimensional Typical dimensions are: time, location and product 2.3 DW Data Storage Example: The sales department of a car manufacturer takes a closer look at the sales volumes View historical sales volume figures from multiple perspectives volume by model, by color, by dealer, over time Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 29 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 30 5

2.2 Multidim. Structure 2.3 DW Data Storage * Mini VAN 289 451 40 113 324 18 1560 455 The complexity grows quickly with the number of dimensions and the number of positions E.g. 3 dimensions with 10 values each Sedan 160 115 6 281 Coupe 16 12 16 44 Black Blue Red * Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 31 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 32 2.3 DW Data Storage Visualization is multidimensional At the same time operational data is stored in relational model Data in the DW can be stored either according to the relational or multidimensional model Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 33 2.3 Relational vs. Multidim. Model Any database manipulation is possible with both technologies The multidimensional model however offers some advantages in the context of DW: Ease of data presentation Ease of maintenance Performance Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 34 2.3 Ease of Presentation The presentation is the natural output of the multidim. model Obtaining the same presentation in the relational model requires a complex query - think about the WalMart example: select sum(sales.quantity_sold) from sales, products, product_categories, manufacturers, stores, cities where manufacturer_name = Colgate and product_category_name = toothpaste and cities.population < 40 000 and trunc(sales.date_time_of_sale) = trunc(sysdate-1) and sales.product_id = products.product_id and sales.store_id = stores.store_id and products.product_category_id = product_categories.product_category_id and products.manufacturer_id = manufacturers.manufacturer_id and stores.city_id = cities.city_id 2.3 Ease of Maintenance Aggregates need to be maintained in the case of the multidimensional model The relational model use indexes and sophisticated joins which require significant maintenance and storage to provide same intuitiveness Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 35 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 36 6

2.3 Performance 2.3 Performance Consider storing the data in DW according to the relational model We have to transform from data from relational to multidim. representation for each query Storing the data in DW in a multidim. model we perform the transformation on each load For DW, relational model can reach similar performance as the multidim. model through database tuning Not possible to tune the DW for all possible ad-hoc queries Conclusion: both models can be used, but the multidimensional model is the practical choice! How do we model the multidimensional representation? Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 37 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 38 2.4 Data Modeling Data modeling - basics Is the process of creating a data model by analyzing the requirements needed to support the business processes of an organization It is sometimes called database modeling/design because a data model is eventually implemented in a database 2.4 Data Modeling Data models Provide the definition and format of data Graphical representations of the data within a specific area of interest Enterprise Data Model: represents the integrated data requirements of a complete business organization Subject Area Data Model: Represents the data requirements of a single business area or application Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 39 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 40 2.4 Phases 2.4 Phases DBMS Independent DBMS Dependent Functional Application Program Transaction Implementation Application Requirement Conceptual Logical Physical Data requirements Conceptual schema Logical schema Conceptual Transforms data requirements to conceptual model Conceptual model describes data entities, relationships, constraints, etc. on high-level Does not contain any implementation details Independent of used software and hardware Logical (next lecture) Maps the conceptual data model to the logical data model used by the DBMS E.g. relational model, dimensional model, Technology independent conceptual model is adapted to the used DBMS software Physical (next lecture) Creates internal structures needed to efficiently store/manage data Table spaces, indexes, access paths, Depends on used hardware and DBMS software Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 41 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 42 7

2.4 Phases 2.4 Conceptual Model Going from one phase to the next: The phase must be complete The result serves as input for the next phase Often automatic transition is possible with additional designer feedback Conceptual ER-diagram, UML, Logical Tables, Columns, Physical Tablespaces, Indexes, Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 43 Highest conceptual grouping of ideas Data tends to naturally cluster with data from the same or similar categories relevant to the organization The major relationships between subjects have been defined Least amount of detail Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 44 2.4 Conceptual Model 2.4 Conceptual Model Conceptual design Entity-Relationship (ER) Modeling Entities - things in the real world Conceptual ER-diagram, UML, E.g. Car, Account, Product Car Account Product Attributes property of an entity, entity type, or relationship type Car Color E.g. color of a car, balance of an account, price of a product Relationships between entities there can be relationships, which also can have attributes E.g. Person owns Car Person owns Car registration number Student 1 enrolls N N 1 attends name time N Course of Study N id title day of week Lecture N N part of Lecture instance instantiates N name 1 room credits N prereq. curriculum semester teaches semester id N Professor name department Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 45 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 46 2.4 Conceptual Model Conceptual design in usually done using the Unified Modeling Language (UML) Conceptual Class Diagram, Component Diagram, Object Diagram, Package Diagram For Data Modeling only Class Diagrams are used Entity type becomes class Relationships become associations There are special types of associations like: aggregation, composition, or generalization CLASS NAME ER-diagram, UML, attribute 1 : domain attribute n : domain operation 1 operation m Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 47 2.4 Logical Model Logical design arranges data into a logical structure Logical Tables, Columns, Which can be mapped into the storage objects supported by DBMS In the case of RDB, the storage objects are tables which store data in rows and columns Attribute Tuple Relation Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 48 8

2.4 Physical Model Physical design specifies the physical configuration of the database on the storage media Detailed specification of: data elements, data types, indexing options, and other parameters residing in the DBMS data dictionary Physical Tablespaces Indexes 2.4 Data Modeling for DW For DW the models have to consider support for multidimensional data In the relational model the classical goal is to Remove redundancy Allow efficient retrieval of individual records In the case of DW Redundancy is necessary to speed up queries OLAP queries usually involve multiple records (range queries) and aggregates Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 49 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 50 2.4 Multidim. Conceptual Model Modeling business queries Define the purpose of the DW and decide on the subject(s) Time Identify questions of interest Who bought the products? Customers (customers and their structure) Who sold the product? (sales organization) What was sold? (product structure) When was it sold? (time structure) Business Model Products Employees Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 51 2.4 Multidim. Conceptual Model Components of conceptual design for DW Facts: a fact is a focus of interest for decision-making, e.g., sales, shipments.. Measures: attributes that describe facts from different points of view, e.g., each sale is measured by its revenue Dimensions: discrete attributes which determine the granularity adopted to represent facts, e.g. product, store, date Hierarchies: are made up of dimension attributes Determine how facts may be aggregated and selected, e.g., day month quarter - year Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 52 2.4 Multidim. Conceptual Model Conceptual design models for DW Multidimensional Entity Relationship (ME/R) Model Multidimensional UML (muml) Other methods e.g., Dimension Fact Model, Totok approach, etc. ME/R Model Its purpose is to create an intuitive representation of the multidimensional data that is optimized for high-performance access It represents a specialization and evolution of the E/R to allow specification of multidimensional semantics Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 53 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 54 9

ME/R notation was influenced by the following considerations Specialization of the E/R model All new elements of the ME/R have to be specializations of the E/R elements In this way the flexibility and power of expression of the E/R models are not reduced Minimal expansion of the E/R model Easy to understand/learn/use: the number of additional elements should be small Representation of the multidimensional semantics Although being minimal, it should be powerful enough to be able to represent multidimensional semantics There are 3 main ME/R constructs The fact node The level node A special binary classification edge Fact Characteristics Classification level Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 55 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 56 Lets consider a store scenario designed in E/R Entities bear little semantics E/R doesn t support classification levels ME/R notation: Prod. Categ Prod. Family Prod. Group Article Package 1 Is packed in Belongs to 1 Product group n n Article n Article Nr Date is sold m District Store City Is in Name Country Year Region Quarter District Week City Month Store Day Characteristics Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 57 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 58 ME/R notation: was selected as fact node The dimensions are product, geographical area and time The dimensions are represented through the so called Basic Article Store Classification Level Characteristics Day Alternative paths in the classification level are also possible Week Month Day 2.4 Unified Modeling Language UML is a general purpose modeling language It can be tailored to specific domains by using the following mechanisms Stereotypes: building new elements Tagged values: new properties Constraints: new semantics Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 59 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 60 10

2.4 muml 2.4 muml Stereotype Grants a special semantics to UML construct without modifying it There are 4 possible representations of the stereotype in UML Icon Decoration Label None Fact 1 Fact 2 <<Fact>> Fact 3 Fact 4 Tagged value Define properties by using a pair of tag and data value Tag = Value E.g. formula= UnitsSold*UnitPrice <<Fact-Class>> UnitsSold: UnitPrice: Price /VolumeSold: Price {formula= UnitsSold*UnitPrice, parameter= UnitsSold, UnitPrice } Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 61 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 62 <<Shared -Roll-up>> Year Week Week 2.4 muml Year Land Country 1..2 Year Architectures: Region Quarter Month Month Day <<Dimension>> Time Quarter <<Fact-Class>> 1..* <<Fact-Class>> Sold products Region City City Store <<Dimension>> Geography <<Dimension>> Product Distributor Country Product Product Group Prod. Group Product categ Prod. Categ Summary Basic architecture, vertical three-tier architecture, horizontal dependent/independent data mart architecture DW may be centralized or geographically and technologically distributed Data Modeling: Data in the DW is represented in a multidimensional manner Multidimensional conceptual model Multidimensional Entity Relationship (ME/R) Model Multidimensional UML (muml) Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 63 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 64 Next lecture Data Modeling (continued) Logical model Physical model Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 65 11