Star Schema מחסני נתונים. Star Schema Example 1. Star Schema

Similar documents
Lecture 2 and 3 - Dimensional Modelling

A Star Schema Has One To Many Relationship Between A Dimension And Fact Table

Basics of Dimensional Modeling

Data Warehouse and Data Mining

A Multi-Dimensional Data Model

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model

ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS. CS121: Relational Databases Fall 2017 Lecture 22

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database.

Logical Design A logical design is conceptual and abstract. It is not necessary to deal with the physical implementation details at this stage.

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses

Data Mining. ❸Chapter 3 Data warehouse, ETL and OLAP. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

Data Warehouse - Basic Concepts

BI (Business Intelligence)

Fig 1.2: Relationship between DW, ODS and OLTP Systems

CHAPTER 3 Implementation of Data warehouse in Data Mining

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Seminars of Software and Services for the Information Society. Data Warehousing Design Issues

Data Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Data Warehousing. Overview

UNIT

Business Intelligence. You can t manage what you can t measure. You can t measure what you can t describe. Ahsan Kabir

Data Warehouse Logical Design. Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato)

QUALITY MONITORING AND

Data Warehousing. Syllabus. An Introduction to Oracle Warehouse Builder. Index

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

Reminds on Data Warehousing

Data warehouse architecture consists of the following interconnected layers:

Data warehouse design

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

Dta Mining and Data Warehousing

Data Warehouse Design Using Row and Column Data Distribution

STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS. By: Dr. Tendani J. Lavhengwa

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Advanced Modeling and Design

Course Number : SEWI ZG514 Course Title : Data Warehousing Type of Exam : Open Book Weightage : 60 % Duration : 180 Minutes

Logical design DATA WAREHOUSE: DESIGN Logical design. We address the relational model (ROLAP)

Evolution of Database Systems

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.

Handout 12 Data Warehousing and Analytics.

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

Information Management course

Data Warehousing & Mining

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Advanced Data Management Technologies Written Exam

Data Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar

Introduction to Data Warehousing

Data Warehouse Testing. By: Rakesh Kumar Sharma

DATA MINING TRANSACTION

Data Warehousing and OLAP

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Data Warehouse and Data Mining

Sql Fact Constellation Schema In Data Warehouse With Example

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

Call: Datastage 8.5 Course Content:35-40hours Course Outline

Data Warehouse and Data Mining

What is a Data Warehouse?

Best Practices - Pentaho Data Modeling

Advanced Data Management Technologies

Big Data 13. Data Warehousing

Decision Support Systems aka Analytical Systems

OPEN LAB: HOSPITAL. An hospital needs a DM to extract information from their operational database with information about inpatients treatments.

Improving the Performance of OLAP Queries Using Families of Statistics Trees

20767B: IMPLEMENTING A SQL DATA WAREHOUSE

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus

COGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata

OLAP Introduction and Overview

Decision Support Systems

Data Model Overview Modeling for the Enterprise while Serving the Individual

Informatica Power Center 10.1 Developer Training

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 6 Normalization of Database Tables

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

Data Warehousing & OLAP

Cognos also provides you an option to export the report in XML or PDF format or you can view the reports in XML format.

Call: SAS BI Course Content:35-40hours

ETL Best Practices and Techniques. Marc Beacom, Managing Partner, Datalere

Data Mining Concepts & Techniques

Data Warehouse and Data Mining

CS 1655 / Spring 2013! Secure Data Management and Web Applications

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi

Data Warehousing with Perl Colin Bradford

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

Data Warehousing and OLAP Technology for Primary Industry

Complete. The. Reference. Christopher Adamson. Mc Grauu. LlLIJBB. New York Chicago. San Francisco Lisbon London Madrid Mexico City

Exam /Course 20767B: Implementing a SQL Data Warehouse

Data Warehousing & OLAP

COMP9318 Tutorial 1. Wei WANG The University of New South Wales

The strategic advantage of OLAP and multidimensional analysis

Chapter 4, Data Warehouse and OLAP Operations

Information Management course

Data Strategies for Efficiency and Growth

Big Data 13. Data Warehousing

CHAPTER 3 BUILDING ARCHITECTURAL DATA WAREHOUSE FOR CANCER DISEASE

Oracle Hyperion Profitability and Cost Management

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20

IST722 Data Warehousing

Segregating Data Within Databases for Performance Prepared by Bill Hulsizer

Column-Stores vs. Row-Stores: How Different Are They Really?

Transcription:

Star Schema In a star schema, each dimension table has a single-part primary key that links to one part of the multipart primary key in the fact table. מחסני נתונים תכנון לוגי של מסד נתונים רב מימדי באמצעות סכימה טבלאית 4 Star Schema Example 1 Time Dimensions Day of week Day_number_of month Week_number_in_yea r Month Quarter Year Holliday_flag Weekday_flag Product_key Store_key Dollars_sold Units_sold Product Dimension Product_key Description Brand Category Store Dimension Store_key Store_name Address Floor_plan_type Mainly descriptiv e textual Dimension 1 Fact Table Dimension 2 3 d1_key1 Att1 Att2 d2_key1 fact1 Dimension 3 fact2 Dimension 4 d3_key1 Star Schema d1_key1 d2_key2 d3_key1 d4_key1 Mainly numeric and additive d4_key1 1

Star Schema Example 3 Star Schema Example 2 Reminder: Normal Forms Seeks to eliminate data redundancy: transaction that changes any data only need to touch the database in one place (optimized for updates) The Standard Template Query Select p.brand, sum(f.dollars),sum(f.units) From sales f, product p, time t Where f.product_key=p.product_key And f.time_key = t.time_key And t.quarter= 1 Q 1995 Group by p.brand 2

On the other hand 1. Complexity of query specification is high. Without normalization it will be much clearer to user. (Simple queries structures) 2. Poor access efficiency Normalized design is the worst, by far, for most query access. A normalized design is optimized for key- based, record-at-a-time inquiry or table-level query that efficiently uses the provided indexes. Resisting Normalization 1. Eliminate redundancy? Generally eliminating duplicate rows is good. However eliminating "redundant" attributes in a star schema dimension table will actually destroy its high- access efficiency. Time saving (browsing performance) is much more critical in data warehouse. 2. Save space? This corollary to eliminating redundancy is a holdover from another era. The relative impact of storage on cost is way down. The loss of access efficiency has far greater cost impact. Furthermore The Fact table in a dimensional schema is naturally highly normalized. Disk space saving due to normalization is typically less than 1%. 3. Support efficient update? Does not apply at all - Data Warehouse is Nonvolatile: no updates of data (only data loading). The load methods for relational tables in a star schema design can actually be more efficient than a load of normalized transaction and snow- flaked reference data. Division Division_id Division_desc ER - BCNF Region Region_id Region_desc Why Normalization of Dimension does not save space? A typical Example Fact Table data size: Fact Table index size: Largest dim table size: Savings by normalization: Total size before: Total size after: 30GB 20GB 0.1GB 0.05GB 51GB 50.5GB. Dept Dept_desc Division_id Facts Week_id Market Market_desc Region_id 3

Snowflake Schema Dimensional (Denormalization) In a snowflake schema, one or more dimension tables are decomposed into multiple tables with the subordinate dimension tables joined to a primary dimension table instead of to the fact table. i.e.:a refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake Dept. Lookup Dept_desc Division_desc Facts Week_id Market Lookup Market_desc Region_desc Snowflake Schema Snowflake Schema Large Hierarchy Customer 15 amount Name Demographic Income_Level Age_Level Sex 4

18 amount Mini-Dimension Customer Name Demographic Income_Level Age_Level Sex Star schemas or Snowflake schemas? Both star and snowflake schemas can represents the same dimensional models; the difference is in their RDBMS implementations. Snowflake schemas support ease of dimension maintenance because they are more normalized. Star schemas are easier for direct user access and often support simpler and more efficient queries. The decision to model a dimension as a star or snowflake depends on the nature of the dimension itself, such as how frequently it changes and which of its elements change, and often involves evaluating tradeoffs between ease of use and ease of maintenance. In most designs, star schemas are preferable to snowflake schemas because they involve fewer joins for information retrieval. Surrogate keys A surrogate key is the primary key for a dimension table and is independent of any keys provided by source data systems. Surrogate keys are created and maintained in the data warehouse and should not encode any information about the contents of records; automatically increasing integers make good surrogate keys. The original key for each record may be carried in the dimension table but is not used as the primary key. Benefits: a layer of isolation between DW and the source system; Simple: numeric keys Can handle ambiguous ID s. Drawback: increased ETL processing Dimensions Keys Using Original Operational keys Benefit: reduced transformation effort Drawbacks: Compound and textual keys; Dependency on the source systems (OLTP); for instance what happen if the operational system create new key when customer change address, while we don t want to create a new customer. Ambiguous ID s coming from different sources; Multiple application systems World wide companies with many branches: each branch uses its own customer s counting. companies that have done mergers or acquisitions. 5

Time/Date Dimension For hourly time granularity, the hour breakdown can be incorporated into the date dimension or placed in a separate dimension. Business needs influence this design decision. If the main use is to extract contiguous chunks of time that cross day boundaries (for example 11/24/2000 10 p.m. to 11/25/2000 6 a.m.), then it is easier if the hour and day are in the same dimension. However, it is easier to analyze cyclical and recurring daily events if they are in separate dimensions. Unless there is a clear reason to combine date and hour in a single dimension, it is generally better to keep them in separate dimensions! Time/Date Dimension A date dimension with one record per day will suffice if users do not need time granularity finer than a single day. A date by day dimension table will contain 365 records per year (366 in leap years). A separate time dimension table should be constructed if a fine time granularity, such as minute or second, is needed. A time dimension table of one-minute granularity will contain 1,440 rows for a day, and a table of seconds will contain 86,400 rows for a day. If exact event time is needed, it should be stored in the fact table. When a separate time dimension is used, the fact table contains one foreign key for the date dimension and another for the time dimension. Separate date and time dimensions simplify many filtering operations. For example, summarizing data for a range of days requires joining only the date dimension table to the fact table. Analyzing cyclical data by time period within a day requires joining just the time dimension table. The date and time dimension tables can both be joined to the fact table when a specific time range is needed. 6