CS 1655 / Spring 2013! Secure Data Management and Web Applications

Similar documents
Chapter 18: Data Analysis and Mining

Adnan YAZICI Computer Engineering Department

Data Warehouse and Data Mining

An Overview of Data Warehousing and OLAP Technology

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Decision Support Systems aka Analytical Systems

Data Warehousing & OLAP

Data Mining Concepts & Techniques

ETL and OLAP Systems

Unit 7: Basics in MS Power BI for Excel 2013 M7-5: OLAP

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

Advanced Data Management Technologies

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

On-Line Application Processing

Data Warehouse and Data Mining

REPORTING AND QUERY TOOLS AND APPLICATIONS

Basics of Dimensional Modeling

collection of data that is used primarily in organizational decision making.

On-Line Analytical Processing (OLAP) Traditional OLTP

Step-by-step data transformation

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

Improving the Performance of OLAP Queries Using Families of Statistics Trees

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse

CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News!

Data Warehousing and OLAP

Data Warehousing and Decision Support

International Journal of Scientific & Engineering Research, Volume 7, Issue 11, November ISSN

Data Warehousing and Decision Support

Data Warehousing. Overview

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

A Multi-Dimensional Data Model

Data Warehousing and Data Mining SQL OLAP Operations

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

The strategic advantage of OLAP and multidimensional analysis

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus

Data Warehouse. Asst.Prof.Dr. Pattarachai Lalitrojwong

Chapter 4, Data Warehouse and OLAP Operations

Guide Users along Information Pathways and Surf through the Data

OLAP Introduction and Overview

OLAP Systems and Multidimensional Expressions

Sql Fact Constellation Schema In Data Warehouse With Example

Data Warehouses These slides are a modified version of the slides of the book Database System Concepts (Chapter 18), 5th Ed McGraw-Hill by

Call: SAS BI Course Content:35-40hours

Chapter 18: Data Analysis and Mining

Data warehouses Decision support The multidimensional model OLAP queries

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

DATA MINING TRANSACTION

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

InfoSphere Warehouse V9.5 Exam.

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives

SQL Server Analysis Services

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 14 - Data Warehousing and Column Stores

Big Data 13. Data Warehousing

DATA MINING AND WAREHOUSING

Data Warehousing Conclusion. Esteban Zimányi Slides by Toon Calders

Multidimensional Queries

DATA WAREHOUING UNIT I

1Z0-526

Data Warehousing & OLAP

53 Multidimensional Databases and OLAP

DATA WAREHOUSING II. CS121: Relational Databases Fall 2017 Lecture 23

QUALITY MONITORING AND

OBIEE Performance Improvement Tips and Techniques

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu

MIS2502: Data Analytics Dimensional Data Modeling. Jing Gong

Enterprise Informatization LECTURE

Rocky Mountain Technology Ventures

CHAPTER 3 Implementation of Data warehouse in Data Mining

Decision Support. Chapter 25. CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1

Create Cube From Star Schema Grouping Framework Manager

Data Warehouse. Concepts and Techniques. Chapter 3. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1

Most database operations involve On- Line Transaction Processing (OTLP).

Warehousing. Data Mining

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Data Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

COMP33111 Lecture 3. Plan. Aims COMP33111, 2012/ Introduction to data analytics and on-line analytical processing (OLAP)

Evolution of Database Systems

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

EXAM PRO:MS SQL 2008, Designing a Business Intelligence. Buy Full Product.

Data Analysis and Data Science

Data Warehouses and OLAP. Database and Information Systems. Data Warehouses and OLAP. Data Warehouses and OLAP

Filtering, Sorting and Ranking

Introduction to Data Warehousing

Database Vs. Data Warehouse

Tribhuvan University Institute of Science and Technology MODEL QUESTION

CS 245: Database System Principles. Warehousing. Outline. What is a Warehouse? What is a Warehouse? Notes 13: Data Warehousing

Course Contents: 1 Business Objects Online Training

Transcription:

CS 1655 / Spring 2013 Secure Data Management and Web Applications 03 Data Warehousing Alexandros Labrinidis University of Pittsburgh What is a Data Warehouse A data warehouse: archives information gathered from multiple sources, and stores it under a unified schema, at a single site. Important for large businesses They generate data from multiple divisions, possibly at multiple sites Alexandros Labrinidis, Univ. of Pittsburgh! 2! 1

Loading the Data Warehouse Data is periodically extracted Data is cleansed and transformed Users query the data warehouse Source Systems Data Staging Area Data Warehouse (OLTP) Alexandros Labrinidis, Univ. of Pittsburgh! 3! Data Analysis and OLAP Aggregate functions summarize large volumes of data Online Analytical Processing (OLAP) Interactive analysis of data, allowing data to be summarized and viewed in different ways in an online fashion (with negligible delay) Alexandros Labrinidis, Univ. of Pittsburgh! 4! 2

Multidimensional Data Data that can be modeled as dimension attributes and measure attributes are called multidimensional data. Given a relation used for data analysis, we can identify measure attributes, those that measure some value, and can be aggregated upon. E.g., the attribute number of the sales relation is a measure attribute, since it measures the number of units sold. Some of the other attributes of the relation are identified as dimension attributes, since they define the dimensions on which measure attributes, and summaries of measure attributes, are viewed. Alexandros Labrinidis, Univ. of Pittsburgh! 5! Cross-tab A cross-tab is a table where values for one of the dimension attributes form the row headers, values for another dimension attribute form the column headers Other dimension attributes are listed on top Values in individual cells are (aggregates of) the values of the dimension attributes that specify the cell. Alexandros Labrinidis, Univ. of Pittsburgh! 6! 3

Cross-tab example Alexandros Labrinidis, Univ. of Pittsburgh! 7! Relational Representation of cross-tabs Crosstabs can be represented as relations! The value all is used to represent aggregates! The SQL:1999 standard actually uses null values in place of all! More on this later.! Alexandros Labrinidis, Univ. of Pittsburgh! 8! 4

Cross-tabs of >2 dims? Generalization of cross-tab for more than two dimensions is a data cube 3-dimensional data cubes are easy to visualize Crosstab can be used as two-dimensional views of any n-dimensional data cube Alexandros Labrinidis, Univ. of Pittsburgh! 9! Data Cube Example Alexandros Labrinidis, Univ. of Pittsburgh! 10! 5

Data Cube Axes of the cube represent attributes of the data records e.g. color, month, state Called dimensions Cells hold aggregated measurements e.g. total $ sales, number of autos sold Called facts Auto Sales Real data cubes have >> 3 dimensions Red Blue Gray Jul Aug Sep MD OH PA Alexandros Labrinidis, Univ. of Pittsburgh! 11! Slicing and Dicing Red Red Blue Gray Jul Aug Sep MD OH PA Blue Gray Jul Aug Sep MD OH PA Blue Jul Aug Sep Total Blue Jul Aug Sep MD OH PA Alexandros Labrinidis, Univ. of Pittsburgh! 12! 6

Querying the Data Cube Cross-tabulation Number of Autos Sold Cross-tab for short Report data grouped by 2 PA OH MD Total dimensions Jul Aggregate across other dimensions 45 33 30 108 Include subtotals Aug 50 36 42 128 Operations on a cross-tab Roll up (further aggregation) Drill down (less aggregation) Sep 38 31 40 109 Total 133 100 112 345 Alexandros Labrinidis, Univ. of Pittsburgh! 13! Roll Up and Drill Down Number of Autos Sold PA OH MD Total Jul 45 33 30 108 Aug 50 36 42 128 Sep 38 31 40 109 Total 133 100 112 345 Roll up by Month Number of Autos Sold PA OH MD Total 133 100 112 345 Drill down by Color Number of Autos Sold PA OH MD Total Red 40 29 40 109 Blue 45 31 37 113 Gray 48 40 35 123 Total 133 100 112 345 Alexandros Labrinidis, Univ. of Pittsburgh! 14! 7

Full Data Cube with Subtotals Pre-computation of aggregates fast answers to OLAP queries Ideally, pre-compute all 2 n types of subtotals Otherwise, perform aggregation as needed Coarser-grained totals can be computed from finer-grained totals But not the other way around Alexandros Labrinidis, Univ. of Pittsburgh! 15! Data Cube Lattice State, Month, Color Drill Down State, Month State, Color Month, Color Roll Up State Month Color Total Alexandros Labrinidis, Univ. of Pittsburgh! 16! 8

Hierarchies on Dimensions Hierarchy on dimension attributes: lets dimensions to be viewed at different levels of detail! E.g. the dimension DateTime can be used to aggregate by hour of day, date, day of week, month, quarter or year! Alexandros Labrinidis, Univ. of Pittsburgh! 17! Cross Tabs With Hierarchies Crosstabs can be easily extended to deal with hierarchies! Can drill down or roll up on a hierarchy! Alexandros Labrinidis, Univ. of Pittsburgh! 18! 9

MOLAP vs. ROLAP MOLAP = Multidimensional OLAP Store data cube as multidimensional array (Usually) pre-compute all aggregates Advantages: Very efficient data access fast answers Disadvantages: Doesn t scale to large numbers of dimensions Requires special-purpose data store Alexandros Labrinidis, Univ. of Pittsburgh! 19! Sparsity Imagine a data warehouse for Giant Eagle. Suppose dimensions are: Customer, Product, Store, Day If there are 100,000 customers, 10,000 products, 1,000 stores, and 1,000 days data cube has 1,000,000,000,000,000 cells! Fortunately, most cells are empty. A given store doesn t sell every product on every day. A given customer has never visited most of the stores. A given customer has never purchased most products. Multi-dimensional arrays are not an efficient way to store sparse data. Alexandros Labrinidis, Univ. of Pittsburgh! 20! 10

MOLAP vs. ROLAP ROLAP = Relational OLAP Store data cube in relational database Express queries in SQL Advantages: Scales well to high dimensionality Scales well to large data sets Sparsity is not a problem Uses well-known, mature technology Disadvantages: Query performance is slower than MOLAP Need to construct explicit indexes Alexandros Labrinidis, Univ. of Pittsburgh! 21! Creating a Cross-tab with SQL Grouping Attributes Measurements SELECT state, month, SUM(quantity) FROM sales GROUP BY state, month WHERE color = 'Red' Filters Alexandros Labrinidis, Univ. of Pittsburgh! 22! 11

What about the totals? SQL aggregation query with GROUP BY does not produce subtotals, totals Our cross-tab report is incomplete. Number of Autos Sold CA OR WA Total Jul 45 33 30? State!!Month!!SUM! CA!!Jul!!45! CA!!Aug!!50! CA!!Sep!!38! OR!!Jul!!33! OR!!Aug!!36! OR!!Sep!!31! WA!!Jul!!30! WA!!Aug!!42! WA!!Sep!!40! Aug 50 36 42? Sep 38 31 40? Total???? Alexandros Labrinidis, Univ. of Pittsburgh! 23! One solution: a big UNION ALL Original Query State Subtotals Month Subtotals Overall Total SELECT state, month, SUM(quantity)! FROM sales! GROUP BY state, month! WHERE color = 'Redʻ! UNION ALL! SELECT state, "ALL", SUM(quantity)! FROM sales! GROUP BY state! WHERE color = 'Red'! UNION ALL! SELECT "ALL", month, SUM(quantity)! FROM sales! GROUP BY month! WHERE color = 'Redʻ! UNION ALL! SELECT "ALL", "ALL", SUM(quantity)! FROM sales! WHERE color = 'Red'! Alexandros Labrinidis, Univ. of Pittsburgh! 24! 12

A better solution UNION ALL solution gets cumbersome with more than 2 grouping attributes n grouping attributes 2 n parts in the union OLAP extensions added to SQL 99 are more convenient CUBE, ROLLUP SELECT state, month, SUM(quantity) FROM sales GROUP BY CUBE(state, month) WHERE color = 'Red' Alexandros Labrinidis, Univ. of Pittsburgh! 25! Results of the CUBE query Notice the use of NULL for totals Subtotals at all levels State!!Month!!SUM(quantity)! CA!!Jul!!45! CA!!Aug!!50! CA!!Sep!!38! CA!!NULL!!133! OR!!Jul!!33! OR!!Aug!!36! OR!!Sep!!31! OR!!NULL!!100! WA!!Jul!!30! WA!!Aug!!42! WA!!Sep!!40! WA!!NULL!!112! NULL!!Jul!!108! NULL!!Aug!!128! NULL!!Sep!!109! NULL!!NULL!!345! Alexandros Labrinidis, Univ. of Pittsburgh! 26! 13

ROLLUP vs. CUBE CUBE computes entire lattice ROLLUP computes one path through lattice Order of GROUP BY list matters Groups by all prefixes of the GROUP BY list GROUP BY ROLLUP(A,B,C) A,B,C (A,B) subtotals (A) subtotals Total GROUP BY CUBE(A,B,C) A,B,C Subtotals for the following: (A,B), (A,C), (B,C), (A), (B), (C) Total Alexandros Labrinidis, Univ. of Pittsburgh! 27! ROLLUP example SELECT color, month, state, SUM(quantity)! FROM sales! State, Month, GROUP BY ROLLUP(color,month,state)! Color State, Month State, Color Month, Color State Month Color Total Alexandros Labrinidis, Univ. of Pittsburgh! 28! 14

Star Schema Fact table Date Promotion Sales Product Dimension tables Store Alexandros Labrinidis, Univ. of Pittsburgh! 29! Dimension Tables Each one corresponds to a real-world object or concept. Examples: Customer, Product, Date, Employee, Region, Store, Promotion, Vendor, Partner, Account, Department Properties of dimension tables: Contain many descriptive columns Dimension tables are wide (dozens of columns) Generally don t have too many rows At least in comparison to the fact tables Usually < 1 million rows Contents are relatively static Almost like a lookup table Uses of dimension tables Filters are based on dimension attributes Grouping columns are dimension attributes Fact tables are referenced through dimensions Alexandros Labrinidis, Univ. of Pittsburgh! 30! 15

Fact Tables Each fact table contains measurements about a process of interest. Each fact row contains two things: Numerical measure columns Foreign keys to dimension tables That s all! Properties of fact tables: Very big Often millions or billions of rows Narrow Small number of columns Changes often New events in the world new rows in the fact table Typically append-only Uses of fact tables: Measurements are aggregated fact columns. Alexandros Labrinidis, Univ. of Pittsburgh! 31! 16