Data Warehousing. Overview

Size: px
Start display at page:

Download "Data Warehousing. Overview"

Transcription

1 Data Warehousing Overview Basic Definitions Normalization Entity Relationship Diagrams (ERDs) Normal Forms Many to Many relationships Warehouse Considerations Dimension Tables Fact Tables Star Schema Snowflake Schema Further Warehouse Design Considerations Changing Dimensions Conformed Dimensions

2 Data warehouse A data warehouse is a copy of transaction data specifically structured for querying and reporting. a collection of computerized data that is organized to most optimally support reporting and analysis activity OLTP - On-Line Transaction Processing OLTP describes a type of processing that databases are designed to support. OLTP applications need to support a high number of transactions per unit of time. A transaction is a set of Insert, Update, and sometimes Delete statements that must succeed or fail as a unit. Transactions typically perform such functions as recording orders, depleting inventory, etc. Electronic banking and order processing are common OLTP applications. OLAP - On-Line Analytical Processing In its broadest usage, the term "OLAP" is used as a synonym for "data warehousing". The term "On-Line Analytical Processing" was developed to distinguish data warehousing activities from On-Line Transaction Processing. In a narrower usage, the term OLAP is used to refer to the tools used for Multidimensional Analysis

3 Sample Star Schema: but when people speak of OLAP they may properly be referring to a schema like this one in a relational database.

4 Database Normalization Normalization reduces redundant data storage by organizing data efficiently. There are many ways to normalize a database consistently within a set of business requirements. Normalization reduces the potential for anomalies during data manipulation operations. Non-normalized databases are vulnerable to data anomalies when they store data redundantly. If data is stored in two locations, but is later updated in only one location, then the data becomes inconsistent; this is referred to as an update anomaly. To avoid data anomalies, non-primary key data in a normalized database are stored in only one location. If you need a Department s physical location, you should need to look in the Department Table.

5 Unnormalized Table We could design a database so that each record we would read about specific types of business object would have all the information we d typically need about those object types. But

6 This schema is more typical of a normalized database. We could generate the information on the previous page with this query: Select e.employeeid, e.lastname, e.firstname, d.deptid, d.name, d.location From Department d Inner Join Employee e on d.deptid = e.deptid

7 When we normalize, we re building a logical hierarchy.

8 Entities classes of objects that are of interest from a business standpoint, about which information needs to be maintained In the process of modeling, they evolve into database tables. Entities are always nouns in business narratives (but not all nouns in business narratives are entities). Examples: Employee, Department, Project Entities must have attributes, or properties, that need to be known, which become columns. Employee: Name, Birth Date, Salary Department: Name, Number, Location Each entity is representative of a class of objects, and each instance of an wellformed entity will map to a row in a table. Each instance of an entity must be uniquely distinguishable from other instances of the same entity. An attribute or set of attributes that uniquely identify an entity is called a Unique Identifier (UID).

9 Relationship A bi-directional, significant association between two entities, or between an entity and itself Each (direction of a) relationship has: Name Optionality Either Must Be or May Be Degree/Cardinality/Ordinality 1:1 or 1:M ( or M:M) Degree = 0 is expressed as may be. Each employee must be assigned to one and only one department. Each department may be responsible for one or more employees. Our definitions for entities, attributes and relationships must have equal validity for each instance; not the normal case only. This point is critically important.

10 First Normal Form 1NF requires that each attribute store only one value. There can be no repeating groups ( = no multivalued attributes ). Each attribute of the table is said to be atomic. For example, each record in the Home table below should have only one owner. Each cell, which is the intersection of a row and a column, can contain only one value. Mention the PK convention. Unnormalized Entity What if some homes have more than three owners? How would we write stored procs to read from this table?

11 To support multiple owners we need another entity: This will always be the case when an entity has a 1:M relationship with one of its attributes. Both entities are now in 1NF.

12 Second Normal Form To be in 2NF, a table must be in 1NF. In addition, each non-key attribute must be dependent on all parts of its primary key. There must be no partial key dependencies. In the previous example: The Home entity is not in 2NF.» The Mayor attribute doesn t depend on the entire primary key.» We need a new entity. The Owner entity is not in 2NF.» The Price of Tea does not depend on the Owner.» We decide not to track this attribute. In normalizing to 2NF, we attempt to reduce the amount of redundant data in a table by extracting it, placing it in new tables, and creating relationships between tables.

13 Tables are now in 2NF.

14 Third Normal Form To be in 3NF, a table must be in 2NF. Additionally, all attributes that are not wholly dependent upon the primary key must be remodeled. Each table attribute can depend on nothing other than its primary key. 3NF = Every non-key attribute must depend on the key, the whole key, and nothing but the key. In the previous example: Sun sign depends on birth date, so it should be stored in a different table. A general modeling principle we see here is that when an attribute depends on another attribute, a new table will be necessary to model the relationship.

15 Entities are now in 3NF.

16 Modeling the M:M relationship How do we record the owners of individual homes?

17 We need an intermediate table that has a M:1 relationship with each of its parent tables.

18 The query below shows the name of each home s owner(s).

19 General Remarks: The definitions of normal forms provide guidelines for relational database design. Occasionally, it is necessary to stray from them to meet practical business requirements in an OLTP environment. There is not a single best way to normalize a database to conform with a specific set of business requirements. Insert, Update, and Delete operations run more quickly in a normalized database. Complex Select statements run more slowly.

20 Reasons to denormalize The fundamental reason to denormalize is to improve query performance. Consider the case of City, State, and CityStateZip tables. These tables can be designed to conform to the third normal form. But each time you need to write a query to extract Customer data, you will need to join data from four tables. If no valid business reason exists to divide city, state, and ZIP Code information into separate tables, then it may make sense to denormalize. Dimension tables in a star schema are intentionally denormalized.

21 Normalized database: Many narrow tables (i.e. fewer columns) Optimized for Insert Update, and Delete operations Slower Select statements because of the need for frequent join operations Few indexes Necessary for large OLTP applications Non-normalized database: Fewer (but wider ) tables Faster Select statements because we don t need to join as often Transactions are more problematic because of the need to maintain redundant instances of data during Insert, Update, and Delete operations Many indexes because data is relatively static Necessary for large relational OLAP applications

22 Data Warehouses Data warehouses and data marts are storage mechanisms for read-only, historical, aggregated data. Consider this example: we sell 2 products, dog food and cat food. Each day, we record the sales of each product. Here is some sample OLTP data for a couple of days:

23 Our data warehouse would usually not record this level of detail. Instead, in a warehouse we would summarize, or aggregate, the data to daily totals. Our records in the data warehouse might look something like this: Here we have reduced the number of records by aggregating the individual transaction records into daily records that show the number of each product purchased each day. We can certainly generate this data set from the OLTP system by running a query

24 but if we want to view our data as aggregated numbers broken down along a series of criteria (i.e. so-called by conditions ), then query performance will improve if we store data in a denormalized format. That s exactly what we do when implementing a star schema. It s important to realize that OLTP is not meant to be the basis of a decision support system. OLTP applications are optimized for activities such as recording (high numbers of) orders, etc. A system optimized for processing transactions is not optimized to perform complex analyses designed to uncover hidden trends. Therefore, rather than tie up our OLTP system by performing expensive queries, we should build a less normalized structure that conforms better to our query needs.

25 The Warehouse Typical business questions that drive warehouse design: How many units did we sell last week? Are overall sales of individual products or product categories higher or lower this year than in previous years? On a quarterly or monthly basis, are sales for some products/categories cyclical? In what regions are sales down this year? What products/categories in those regions account for the greatest percentage of the decrease? Some characteristics of warehouse business questions: Many concern the element of time. Many questions require the aggregation of data; sums and counts are important in an OLAP environment, whereas individual transactions are important in an OLTP environment. Each questions looks at data in terms of by conditions. On a quarterly and then monthly basis, are Dairy Product sales cyclical? = We need to see total sales of Dairy Products by quarter and by month.

26 These by conditions drive the design of our star schema. Each by condition is represented by a Dimension table.

27 Dimension Tables General Remarks Product and Geography are common dimensions. Date/Time information is almost always stored in a Dimension table. If our data happen to start on a particular date, do we care what sales have been since that date, or do we care more about how one year s sales compares to other years? Comparing one year to another is a common form of trend analysis accomplished through the use of a star schema.

28 Dimension Table Structure Dimension tables should have a single-field primary key. This key is often an identity column. The value of the primary key is irrelevant; our information is stored in the other fields in the table. Because the fields are the full descriptions, the dimension tables are often wide, i.e. they contain many large fields. For example, if we have a Product dimension, then we ll have fields in it that contain the description, the category name, the sub-category name, etc. These fields do not contain codes that link us to other tables. Dimension tables are often small in terms of row count relative to Fact tables.

29 Dimensional Hierarchies (Denormalization): In a star schema, the entire hierarchy for a dimension is stored in its corresponding Dimension table in the data warehouse. The product dimension, for example, contains individual products. Products are normally grouped into categories, and these categories may contain sub-categories. For example, a product with a product number of M1652 may be a refrigerator. Thus it belongs in the major appliance category, and in the refrigerator sub-category. We may have more levels of sub-categories to further classify each product. In an OLAP environment, it is preferable to maintain the product hierarchy in a single table, although this hierarchy would certainly be distributed among Product, Category, and SubCategory tables in an OLTP environment. This hierarchy allows us to perform drill-down functions on the data. We can perform a query that performs sums by category. We can then drill-down into that category by calculating sums for the subcategories for that category. We can the calculate the sums for the individual products in a particular subcategory. The actual sums we are calculating are based on numbers stored in the fact table.

30 Fact tables When we talk about the way we want to look at data, we usually want to see some sort of aggregated data. These data are called measures. Measures are numeric values that are measurable and additive. Sales dollars are a very common measure. The Number of Customers we have is also a typical measure. We d probably track both of these by day. Fact tables are used to store measures, or facts, which are numeric and additive across some or all dimensions. In the following star schema, sales dollars are numeric, and we can examine total sales in terms of product, category, and time period. Fact tables are narrow in the sense that they contain few (and numeric) columns, but they do contain large numbers of rows. Fact tables are responsible for most of the disk space used in a warehouse.

31 Fact Table Granularity Granularity refers to the level of detail in a fact table and is one of the most important design decisions in data warehouse planning. Granularity is often determined by the time dimension. For example, you may elect to store only weekly or monthly totals for sales dollars. Granularity determines how far we can drill down without recourse to the source OLTP data. Many if not most OLAP systems have daily grain in the Time dimension. Selecting a finer grain results in more records in the fact table. Choose data types for fact table columns that keep the table as small as possible.

32 Aggregations Fact table data consists of aggregations that are based on the fact table s granularity. Frequently we ll want to aggregate to a higher level. We may choose to keep total sales dollars at a quarterly or monthly level. We may be interested in only a particular product or category in this case. A better alternative is to build a cube structure

33 Simple Star Schema: To obtain total sales for all major appliances during March of 1999: Select Sum (sf.salesdollars) as TotalSales From SalesFact sf Inner Join TimeDimension td On td.timeid = sf.timeid Inner Join ProductDimension pd On pd.productid = sf.productid Where pd.category = Major Appliance And td.month = 3 And td.year = 1999

34 Snowflake Schemas Sometimes dimension tables have hierarchies broken out into separate tables. This will result in a different schema type known as a snowflake. This is a more normalized structure, but leads to more difficult queries and slower response times. It does conserve more disk space than a star schema that contains the same data.

35 Graphical comparison of Star and Snowflake schemas Star Schema Snowflake Schema

36 Further Warehouse Design Considerations Changing Dimensions In the schema below, consider a scenario in which we have realigned some of our stores, placing them in different territories and regions.

37 In the StoreDimension table, we have each store in a particular region, territory, and zone. If we simply update the StoreDimension table with new territory/region information, and then examine historical sales for a region, the numbers will no longer be accurate. To address this issue, consider creating new records for affected stores. Every new record will contain each store s new region, but leaves old store records intact along with the old regional sales data. This approach, however, prevents us from comparing this stores current sales to its historical sales unless we keep track of its previous StoreID. This may require an extra field called PreviousStoreID or something similar. There are no right and wrong answers. Each case may require a different solution.

38 When building an enterprise warehouse from local data marts: It is necessary to produce a set of conformed dimensions. It will also be necessary to standardize the definitions of facts. A conformed dimension is a dimension that means the same thing with every possible fact table to which it can be joined. Generally, this means that a conformed dimension is identical in each data mart. The conformed Product dimension is the enterprise s agreed-upon master list of products, including all product attributes and all product rollups such as category, subcategory, and department. The conformed Calendar dimension will almost always be a table of individual days, spanning a decade or more. Each day will have many useful attributes drawn from the legal calendars of the various states and countries the enterprise deals with, as well as special fiscal calendar periods and marketing seasons relevant only to internal managers. Most conformed dimensions will naturally be defined at the most granular level possible. The grain of the Customer dimension will be the individual customer.

39 Simplified Star Schema with Conformed Dimensions

40 Permissible Variations of Conformed Dimensions It is possible to create a subset of a conformed dimension table for certain data marts if you know that the domain of the associated fact table only contains that subset. For example, the Product table for a specific data mart may be restricted so as to include only those products manufactured at that location, if the data mart in question pertains to that location only.

41 Links: Wikipedia page on normalization Datbases.About.Com page on normalization MSDN Glossary Oracle-specific site where I got some schema diagrams Ralph Kimball's Data Warehousing site Kimball on Fact and Dimension Tables BI and Data Warehouse Glossary

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Data Warehouses Chapter 12. Class 10: Data Warehouses 1 Data Warehouses Chapter 12 Class 10: Data Warehouses 1 OLTP vs OLAP Operational Database: a database designed to support the day today transactions of an organization Data Warehouse: historical data is

More information

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS Assist. Prof. Dr. Volkan TUNALI Topics 2 Business Intelligence (BI) Decision Support System (DSS) Data Warehouse Online Analytical Processing (OLAP)

More information

Data warehouse architecture consists of the following interconnected layers:

Data warehouse architecture consists of the following interconnected layers: Architecture, in the Data warehousing world, is the concept and design of the data base and technologies that are used to load the data. A good architecture will enable scalability, high performance and

More information

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.com Objectives Explain the basics of: 1. Data

More information

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses Designing Data Warehouses To begin a data warehouse project, need to find answers for questions such as: Data Warehousing Design Which user requirements are most important and which data should be considered

More information

Data Strategies for Efficiency and Growth

Data Strategies for Efficiency and Growth Data Strategies for Efficiency and Growth Date Dimension Date key (PK) Date Day of week Calendar month Calendar year Holiday Channel Dimension Channel ID (PK) Channel name Channel description Channel type

More information

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY CHARACTERISTICS Data warehouse is a central repository for summarized and integrated data

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension

More information

Rocky Mountain Technology Ventures

Rocky Mountain Technology Ventures Rocky Mountain Technology Ventures Comparing and Contrasting Online Analytical Processing (OLAP) and Online Transactional Processing (OLTP) Architectures 3/19/2006 Introduction One of the most important

More information

DATA MINING TRANSACTION

DATA MINING TRANSACTION DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is

More information

Working with the Business to Build Effective Dimensional Models

Working with the Business to Build Effective Dimensional Models Working with the Business to Build Effective Dimensional Models Laura L. Reeves Co-Founder & Principal April, 2009 Copyright 2009 StarSoft Solutions, Inc. Slide 1 Instructor Information: Laura L. Reeves,

More information

Logical Design A logical design is conceptual and abstract. It is not necessary to deal with the physical implementation details at this stage.

Logical Design A logical design is conceptual and abstract. It is not necessary to deal with the physical implementation details at this stage. Logical Design A logical design is conceptual and abstract. It is not necessary to deal with the physical implementation details at this stage. You need to only define the types of information specified

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

An Overview of Data Warehousing and OLAP Technology

An Overview of Data Warehousing and OLAP Technology An Overview of Data Warehousing and OLAP Technology CMPT 843 Karanjit Singh Tiwana 1 Intro and Architecture 2 What is Data Warehouse? Subject-oriented, integrated, time varying, non-volatile collection

More information

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures) CS614- Data Warehousing Solved MCQ(S) From Midterm Papers (1 TO 22 Lectures) BY Arslan Arshad Nov 21,2016 BS110401050 BS110401050@vu.edu.pk Arslan.arshad01@gmail.com AKMP01 CS614 - Data Warehousing - Midterm

More information

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model Chapter 3 The Multidimensional Model: Basic Concepts Introduction Multidimensional Model Multidimensional concepts Star Schema Representation Conceptual modeling using ER, UML Conceptual modeling using

More information

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Fig 1.2: Relationship between DW, ODS and OLTP Systems 1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

MIS2502: Data Analytics Dimensional Data Modeling. Jing Gong

MIS2502: Data Analytics Dimensional Data Modeling. Jing Gong MIS2502: Data Analytics Dimensional Data Modeling Jing Gong gong@temple.edu http://community.mis.temple.edu/gong Where we are Now we re here Data entry Transactional Database Data extraction Analytical

More information

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke, Chapter 25 Introduction Increasingly,

More information

FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE

FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE David C. Hay Essential Strategies, Inc In the buzzword sweepstakes of 1997, the clear winner has to be Data Warehouse. A host of technologies and techniques

More information

Guide Users along Information Pathways and Surf through the Data

Guide Users along Information Pathways and Surf through the Data Guide Users along Information Pathways and Surf through the Data Stephen Overton, Overton Technologies, LLC, Raleigh, NC ABSTRACT Business information can be consumed many ways using the SAS Enterprise

More information

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke Data Warehouses Yanlei Diao Slides Courtesy of R. Ramakrishnan and J. Gehrke Introduction v In the late 80s and early 90s, companies began to use their DBMSs for complex, interactive, exploratory analysis

More information

Real-World Performance Training Dimensional Queries

Real-World Performance Training Dimensional Queries Real-World Performance Training al Queries Real-World Performance Team Agenda 1 2 3 4 5 The DW/BI Death Spiral Parallel Execution Loading Data Exadata and Database In-Memory al Queries al Queries 1 2 3

More information

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective B.Manivannan Research Scholar, Dept. Computer Science, Dravidian University, Kuppam, Andhra Pradesh, India

More information

The Data Organization

The Data Organization C V I T F E P A O TM The Data Organization 1251 Yosemite Way Hayward, CA 94545 (510) 303-8868 rschoenrank@computer.org Business Intelligence Process Architecture By Rainer Schoenrank Data Warehouse Consultant

More information

ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS. CS121: Relational Databases Fall 2017 Lecture 22

ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS. CS121: Relational Databases Fall 2017 Lecture 22 ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS CS121: Relational Databases Fall 2017 Lecture 22 E-R Diagramming 2 E-R diagramming techniques used in book are similar to ones used in industry

More information

Data Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 7: Schemas Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database schema A Database Schema captures: The concepts represented Their attributes

More information

Decision Support Systems aka Analytical Systems

Decision Support Systems aka Analytical Systems Decision Support Systems aka Analytical Systems Decision Support Systems Systems that are used to transform data into information, to manage the organization: OLAP vs OLTP OLTP vs OLAP Transactions Analysis

More information

Managing Data Resources

Managing Data Resources Chapter 7 Managing Data Resources 7.1 2006 by Prentice Hall OBJECTIVES Describe basic file organization concepts and the problems of managing data resources in a traditional file environment Describe how

More information

This module presents the star schema, an alternative to 3NF schemas intended for analytical databases.

This module presents the star schema, an alternative to 3NF schemas intended for analytical databases. Topic 3.3: Star Schema Design This module presents the star schema, an alternative to 3NF schemas intended for analytical databases. Star Schema Overview The star schema is a simple database architecture

More information

Data Warehousing & OLAP

Data Warehousing & OLAP CMPUT 391 Database Management Systems Data Warehousing & OLAP Textbook: 17.1 17.5 (first edition: 19.1 19.5) Based on slides by Lewis, Bernstein and Kifer and other sources University of Alberta 1 Why

More information

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar 1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar 1) What does the term 'Ad-hoc Analysis' mean? Choice 1 Business analysts use a subset of the data for analysis. Choice 2: Business analysts access the Data

More information

A Multi-Dimensional Data Model

A Multi-Dimensional Data Model A Multi-Dimensional Data Model A Data Warehouse is based on a Multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in

More information

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

MIS2502: Data Analytics Dimensional Data Modeling. Jing Gong

MIS2502: Data Analytics Dimensional Data Modeling. Jing Gong MIS2502: Data Analytics Dimensional Data Modeling Jing Gong gong@temple.edu http://community.mis.temple.edu/gong Where we are Now we re here Data entry Transactional Database Data extraction Analytical

More information

The University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory

The University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory Warehousing Outline Andrew Kusiak 2139 Seamans Center Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335 5934 Introduction warehousing concepts Relationship

More information

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1 Jens Teubner Data Warehousing Winter 2015/16 1 Data Warehousing Jens Teubner, TU Dortmund jensteubner@cstu-dortmundde Winter 2015/16 Jens Teubner Data Warehousing Winter 2015/16 40 Part IV Modelling Your

More information

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

Managing Data Resources

Managing Data Resources Chapter 7 OBJECTIVES Describe basic file organization concepts and the problems of managing data resources in a traditional file environment Managing Data Resources Describe how a database management system

More information

collection of data that is used primarily in organizational decision making.

collection of data that is used primarily in organizational decision making. Data Warehousing A data warehouse is a special purpose database. Classic databases are generally used to model some enterprise. Most often they are used to support transactions, a process that is referred

More information

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Syllabus. Syllabus. Motivation Decision Support. Syllabus Presentation: Sophia Discussion: Tianyu Metadata Requirements and Conclusion 3 4 Decision Support Decision Making: Everyday, Everywhere Decision Support System: a class of computerized information systems

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 07 Terminologies Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Database

More information

UNIT

UNIT UNIT 3.1 DATAWAREHOUSING UNIT 3 CHAPTER 1 1.Designing the Target Structure: Data warehouse design, Dimensional design, Cube and dimensions, Implementation of a dimensional model in a database, Relational

More information

QUALITY MONITORING AND

QUALITY MONITORING AND BUSINESS INTELLIGENCE FOR CMS DATA QUALITY MONITORING AND DATA CERTIFICATION. Author: Daina Dirmaite Supervisor: Broen van Besien CERN&Vilnius University 2016/08/16 WHAT IS BI? Business intelligence is

More information

Data Warehousing and OLAP

Data Warehousing and OLAP Data Warehousing and OLAP INFO 330 Slides courtesy of Mirek Riedewald Motivation Large retailer Several databases: inventory, personnel, sales etc. High volume of updates Management requirements Efficient

More information

CS 1655 / Spring 2013! Secure Data Management and Web Applications

CS 1655 / Spring 2013! Secure Data Management and Web Applications CS 1655 / Spring 2013 Secure Data Management and Web Applications 03 Data Warehousing Alexandros Labrinidis University of Pittsburgh What is a Data Warehouse A data warehouse: archives information gathered

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

More information

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) INTRODUCTION A dimension is an attribute within a multidimensional model consisting of a list of values (called members). A fact is defined by a combination

More information

The strategic advantage of OLAP and multidimensional analysis

The strategic advantage of OLAP and multidimensional analysis IBM Software Business Analytics Cognos Enterprise The strategic advantage of OLAP and multidimensional analysis 2 The strategic advantage of OLAP and multidimensional analysis Overview Online analytical

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 4320 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Data Warehouses and OLAP. Database and Information Systems. Data Warehouses and OLAP. Data Warehouses and OLAP

Data Warehouses and OLAP. Database and Information Systems. Data Warehouses and OLAP. Data Warehouses and OLAP Database and Information Systems 11. Deductive Databases 12. Data Warehouses and OLAP 13. Index Structures for Similarity Queries 14. Data Mining 15. Semi-Structured Data 16. Document Retrieval 17. Web

More information

TIM 50 - Business Information Systems

TIM 50 - Business Information Systems TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz May 20, 2014 Announcements DB 2 Due Tuesday Next Week The Database Approach to Data Management Database: Collection of related files containing

More information

Data-Driven Driven Business Intelligence Systems: Parts I. Lecture Outline. Learning Objectives

Data-Driven Driven Business Intelligence Systems: Parts I. Lecture Outline. Learning Objectives Data-Driven Driven Business Intelligence Systems: Parts I Week 5 Dr. Jocelyn San Pedro School of Information Management & Systems Monash University IMS3001 BUSINESS INTELLIGENCE SYSTEMS SEM 1, 2004 Lecture

More information

Seminars of Software and Services for the Information Society. Data Warehousing Design Issues

Seminars of Software and Services for the Information Society. Data Warehousing Design Issues DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE ANTONIO RUBERTI Master of Science in Engineering in Computer Science (MSE-CS) Seminars in Software and Services for the Information Society

More information

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube OLAP2 outline Multi Dimensional Data Model Need for Multi Dimensional Analysis OLAP Operators Data Cube Demonstration Using SQL Multi Dimensional Data Model Multi dimensional analysis is a popular approach

More information

Advanced Multidimensional Reporting

Advanced Multidimensional Reporting Guideline Advanced Multidimensional Reporting Product(s): IBM Cognos 8 Report Studio Area of Interest: Report Design Advanced Multidimensional Reporting 2 Copyright Copyright 2008 Cognos ULC (formerly

More information

Unit 7: Basics in MS Power BI for Excel 2013 M7-5: OLAP

Unit 7: Basics in MS Power BI for Excel 2013 M7-5: OLAP Unit 7: Basics in MS Power BI for Excel M7-5: OLAP Outline: Introduction Learning Objectives Content Exercise What is an OLAP Table Operations: Drill Down Operations: Roll Up Operations: Slice Operations:

More information

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting.

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting. DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting April 14, 2009 Whitemarsh Information Systems Corporation 2008 Althea Lane Bowie,

More information

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:- UNIT III: Data Warehouse and OLAP Technology: An Overview : What Is a Data Warehouse? A Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, From Data Warehousing to

More information

Star Schema Design (Additonal Material; Partly Covered in Chapter 8) Class 04: Star Schema Design 1

Star Schema Design (Additonal Material; Partly Covered in Chapter 8) Class 04: Star Schema Design 1 Star Schema Design (Additonal Material; Partly Covered in Chapter 8) Class 04: Star Schema Design 1 Star Schema Overview Star Schema: A simple database architecture used extensively in analytical applications,

More information

5-1McGraw-Hill/Irwin. Copyright 2007 by The McGraw-Hill Companies, Inc. All rights reserved.

5-1McGraw-Hill/Irwin. Copyright 2007 by The McGraw-Hill Companies, Inc. All rights reserved. 5-1McGraw-Hill/Irwin Copyright 2007 by The McGraw-Hill Companies, Inc. All rights reserved. 5 hapter Data Resource Management Data Concepts Database Management Types of Databases McGraw-Hill/Irwin Copyright

More information

Best Practices in Data Modeling. Dan English

Best Practices in Data Modeling. Dan English Best Practices in Data Modeling Dan English Objectives Understand how QlikView is Different from SQL Understand How QlikView works with(out) a Data Warehouse Not Throw Baby out with the Bathwater Adopt

More information

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)? Introduction to Data Warehousing and Business Intelligence Overview Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction A tour of the coming DW lectures DW Applications Loosely

More information

Data Warehouse. Asst.Prof.Dr. Pattarachai Lalitrojwong

Data Warehouse. Asst.Prof.Dr. Pattarachai Lalitrojwong Data Warehouse Asst.Prof.Dr. Pattarachai Lalitrojwong Faculty of Information Technology King Mongkut s Institute of Technology Ladkrabang Bangkok 10520 pattarachai@it.kmitl.ac.th The Evolution of Data

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Proceedings of the IE 2014 International Conference AGILE DATA MODELS

Proceedings of the IE 2014 International Conference  AGILE DATA MODELS AGILE DATA MODELS Mihaela MUNTEAN Academy of Economic Studies, Bucharest mun61mih@yahoo.co.uk, Mihaela.Muntean@ie.ase.ro Abstract. In last years, one of the most popular subjects related to the field of

More information

Create Cube From Star Schema Grouping Framework Manager

Create Cube From Star Schema Grouping Framework Manager Create Cube From Star Schema Grouping Framework Manager Create star schema groupings to provide authors with logical groupings of query Connect to an OLAP data source (cube) in a Framework Manager project

More information

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database.

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database. 1. Creating a data warehouse involves using the functionalities of database management software to implement the data warehouse model as a collection of physically created and mutually connected database

More information

Evolution of Database Systems

Evolution of Database Systems Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second

More information

SAS Data Integration Studio 3.3. User s Guide

SAS Data Integration Studio 3.3. User s Guide SAS Data Integration Studio 3.3 User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2006. SAS Data Integration Studio 3.3: User s Guide. Cary, NC: SAS Institute

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 02 Introduction to Data Warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Department of Industrial Engineering. Sharif University of Technology. Operational and enterprises systems. Exciting directions in systems

Department of Industrial Engineering. Sharif University of Technology. Operational and enterprises systems. Exciting directions in systems Department of Industrial Engineering Sharif University of Technology Session# 9 Contents: The role of managers in Information Technology (IT) Organizational Issues Information Technology Operational and

More information

A Star Schema Has One To Many Relationship Between A Dimension And Fact Table

A Star Schema Has One To Many Relationship Between A Dimension And Fact Table A Star Schema Has One To Many Relationship Between A Dimension And Fact Table Many organizations implement star and snowflake schema data warehouse The fact table has foreign key relationships to one or

More information

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa Data Warehousing Data Warehousing and Mining Lecture 8 by Hossen Asiful Mustafa Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information,

More information

ACS-2914 Normalization March 2009 NORMALIZATION 2. Ron McFadyen 1. Normalization 3. De-normalization 3

ACS-2914 Normalization March 2009 NORMALIZATION 2. Ron McFadyen 1. Normalization 3. De-normalization 3 NORMALIZATION 2 Normalization 3 De-normalization 3 Functional Dependencies 4 Generating functional dependency maps from database design maps 5 Anomalies 8 Partial Functional Dependencies 10 Transitive

More information

ETL and OLAP Systems

ETL and OLAP Systems ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester

More information

Data warehouses Decision support The multidimensional model OLAP queries

Data warehouses Decision support The multidimensional model OLAP queries Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing

More information

CHAPTER 3 Implementation of Data warehouse in Data Mining

CHAPTER 3 Implementation of Data warehouse in Data Mining CHAPTER 3 Implementation of Data warehouse in Data Mining 3.1 Introduction to Data Warehousing A data warehouse is storage of convenient, consistent, complete and consolidated data, which is collected

More information

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process. MTAT.03.183 Data Mining Week 7: Online Analytical Processing and Data Warehouses Marlon Dumas marlon.dumas ät ut. ee Acknowledgment This slide deck is a mashup of the following publicly available slide

More information

Chapter 6 VIDEO CASES

Chapter 6 VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Data Warehouse Logical Design. Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato)

Data Warehouse Logical Design. Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato) Data Warehouse Logical Design Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato) Data Mart logical models MOLAP (Multidimensional On-Line Analytical Processing) stores data

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

Data Warehousing. Adopted from Dr. Sanjay Gunasekaran

Data Warehousing. Adopted from Dr. Sanjay Gunasekaran Data Warehousing Adopted from Dr. Sanjay Gunasekaran Main Topics Overview of Data Warehouse Concept of Data Conversion Importance of Data conversion and the steps involved Common Industry Methodology Outline

More information

Database Vs. Data Warehouse

Database Vs. Data Warehouse Database Vs. Data Warehouse Similarities and differences Databases and data warehouses are used to generate different types of information. Information generated by both are used for different purposes.

More information

Data Warehousing and OLAP Technology for Primary Industry

Data Warehousing and OLAP Technology for Primary Industry Data Warehousing and OLAP Technology for Primary Industry Taehan Kim 1), Sang Chan Park 2) 1) Department of Industrial Engineering, KAIST (taehan@kaist.ac.kr) 2) Department of Industrial Engineering, KAIST

More information

The Data Organization

The Data Organization C V I T F E P A O TM The Data Organization Best Practices Metadata Dictionary Application Architecture Prepared by Rainer Schoenrank January 2017 Table of Contents 1. INTRODUCTION... 3 1.1 PURPOSE OF THE

More information

Development of an interface that allows MDX based data warehouse queries by less experienced users

Development of an interface that allows MDX based data warehouse queries by less experienced users Development of an interface that allows MDX based data warehouse queries by less experienced users Mariana Duprat André Monat Escola Superior de Desenho Industrial 400 Introduction Data analysis is a fundamental

More information

Handout 12 Data Warehousing and Analytics.

Handout 12 Data Warehousing and Analytics. Handout 12 CS-605 Spring 17 Page 1 of 6 Handout 12 Data Warehousing and Analytics. Operational (aka transactional) system a system that is used to run a business in real time, based on current data; also

More information

IST722 Data Warehousing

IST722 Data Warehousing IST722 Data Warehousing Dimensional Modeling Michael A. Fudge, Jr. Pop Quiz: T/F 1. The business meaning of a fact table row is known as a dimension. 2. A dimensional data model is optimized for maximum

More information

Oracle Database 11g: Data Warehousing Fundamentals

Oracle Database 11g: Data Warehousing Fundamentals Oracle Database 11g: Data Warehousing Fundamentals Duration: 3 Days What you will learn This Oracle Database 11g: Data Warehousing Fundamentals training will teach you about the basic concepts of a data

More information

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Chapter 3. Foundations of Business Intelligence: Databases and Information Management Chapter 3 Foundations of Business Intelligence: Databases and Information Management THE DATA HIERARCHY TRADITIONAL FILE PROCESSING Organizing Data in a Traditional File Environment Problems with the traditional

More information

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems Management Information Systems Management Information Systems B10. Data Management: Warehousing, Analyzing, Mining, and Visualization Code: 166137-01+02 Course: Management Information Systems Period: Spring

More information

Full file at

Full file at Chapter 2 Data Warehousing True-False Questions 1. A real-time, enterprise-level data warehouse combined with a strategy for its use in decision support can leverage data to provide massive financial benefits

More information

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

PASS4TEST. IT Certification Guaranteed, The Easy Way!   We offer free update service for one year PASS4TEST IT Certification Guaranteed, The Easy Way! \ http://www.pass4test.com We offer free update service for one year Exam : BI0-130 Title : Cognos 8 BI Modeler Vendors : COGNOS Version : DEMO Get

More information

Normalization in DBMS

Normalization in DBMS Unit 4: Normalization 4.1. Need of Normalization (Consequences of Bad Design-Insert, Update & Delete Anomalies) 4.2. Normalization 4.2.1. First Normal Form 4.2.2. Second Normal Form 4.2.3. Third Normal

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

The COSMIC Functional Size Measurement Method Version 4.0.1

The COSMIC Functional Size Measurement Method Version 4.0.1 The COSMIC Functional Size Measurement Method Version 4.0.1 Guideline for sizing Data Warehouse Application Software Version 1.1 April 2015 Acknowledgements Reviewers of v1.1 (alphabetical order) Diana

More information