On-Line Application Processing

Similar documents
Warehousing. Data Mining

Most database operations involve On- Line Transaction Processing (OTLP).

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #21: Data Mining and Warehousing

Analytical data bases Database lectures for math

Introduction to Big Data. University of California, Berkeley School of Information IS 257: Database Management

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

DATA MINING TRANSACTION

SQL Continued! Outerjoins, Aggregations, Grouping, Data Modification

Information Integration

Data Warehouse and Data Mining

Introduction to SQL. Multirelation Queries Subqueries. Slides are reused by the approval of Jeffrey Ullman s

Jarek Szlichta Acknowledgments: Jiawei Han, Micheline Kamber and Jian Pei, Data Mining - Concepts and Techniques

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

Introduction to SQL. Select-From-Where Statements Multirelation Queries Subqueries. Slides are reused by the approval of Jeffrey Ullman s

Introduction to SQL. Select-From-Where Statements Multirelation Queries Subqueries

Tribhuvan University Institute of Science and Technology MODEL QUESTION

CS 1655 / Spring 2013! Secure Data Management and Web Applications

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

Introduction to SQL SELECT-FROM-WHERE STATEMENTS SUBQUERIES DATABASE SYSTEMS AND CONCEPTS, CSCI 3030U, UOIT, COURSE INSTRUCTOR: JAREK SZLICHTA

Decision Support Systems aka Analytical Systems

More SQL. Extended Relational Algebra Outerjoins, Grouping/Aggregation Insert/Delete/Update

CS 245: Database System Principles. Warehousing. Outline. What is a Warehouse? What is a Warehouse? Notes 13: Data Warehousing

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

collection of data that is used primarily in organizational decision making.

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Data Warehouse and Data Mining

Data Warehousing & OLAP

Question Bank. 4) It is the source of information later delivered to data marts.

Data Mining Concepts & Techniques

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 14 - Data Warehousing and Column Stores

ETL and OLAP Systems

EXTENDED RELATIONAL ALGEBRA OUTERJOINS, GROUPING/AGGREGATION INSERT/DELETE/UPDATE

Data warehouses Decision support The multidimensional model OLAP queries

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

Transactions, Views, Indexes. Controlling Concurrent Behavior Virtual and Materialized Views Speeding Accesses to Data

InfoSphere Warehouse V9.5 Exam.

Fig 1.2: Relationship between DW, ODS and OLTP Systems

signicantly higher than it would be if items were placed at random into baskets. For example, we

CHAPTER 3 Implementation of Data warehouse in Data Mining

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

Data Warehousing and OLAP

DATA WAREHOUING UNIT I

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives

Grouping Operator. Applying γ L (R) Recall: Outerjoin. Example: Grouping/Aggregation. γ A,B,AVG(C)->X (R) =??

An Overview of Data Warehousing and OLAP Technology

Chapter 6 The database Language SQL as a tutorial

Lecture 18. Business Intelligence and Data Warehousing. 1:M Normalization. M:M Normalization 11/1/2017. Topics Covered

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Views, Indexes, Authorization. Views. Views 8/6/18. Virtual and Materialized Views Speeding Accesses to Data Grant/Revoke Priviledges

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

Knowledge Discovery & Data Mining

Data warehouse and Data Mining

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

Entity-Relationship Model. Purpose of E/R Model

ISM 50 - Business Information Systems


Data Warehousing Introduction. Toon Calders

ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS. CS121: Relational Databases Fall 2017 Lecture 22

Data Warehouse and Data Mining

Data Mining & Data Warehouse

Chapter 6 The database Language SQL as a tutorial

OLAP Introduction and Overview

CS 317/387. A Relation is a Table. Schemas. Towards SQL - Relational Algebra. name manf Winterbrew Pete s Bud Lite Anheuser-Busch Beers

Data Warehousing and Decision Support

TIM 50 - Business Information Systems

Basics of Dimensional Modeling

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders

TIM 50 - Business Information Systems

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

1. Inroduction to Data Mininig

High dim. data. Graph data. Infinite data. Machine learning. Apps. Locality sensitive hashing. Filtering data streams.

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A.

Data Mining and Data Warehousing Introduction to Data Mining

Sql Fact Constellation Schema In Data Warehouse With Example

Data Warehousing and Decision Support

Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem

Call: SAS BI Course Content:35-40hours

Business Analytics and Big Data: the process and the tools

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?

SQL DATA DEFINITION LANGUAGE

Constraints, Views & Indexes. Running Example. Kinds of Constraints INTEGRITY CONSTRAINTS. Keys Foreign key or referential integrity constraints

CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News!

On-Line Analytical Processing (OLAP) Traditional OLTP

Chapter 2 The relational Model of data. Relational model introduction

Data mining - detailed outline. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Problem.

Chapter 4 Data Mining A Short Introduction

BUSINESS INTELLIGENCE. SSAS - SQL Server Analysis Services. Business Informatics Degree

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Evolution of Database Systems

Transcription:

On-Line Application Processing WAREHOUSING DATA CUBES DATA MINING 1

Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries. New architectures have been developed to handle analytic queries efficiently. 2

The Data Warehouse The most common form of data integration. Copy sources into a single DB (warehouse) and try to keep it up-to-date. Usual method: periodic reconstruction of the warehouse, perhaps overnight. Frequently essential for analytic queries. 3

OLTP Most database operations involve On-Line Transaction Processing (OLTP). Short, simple, frequent queries and/or modifications, each involving a small number of tuples. Examples: Answering queries from a Web interface, sales at cash registers, selling airline tickets. 4

OLAP On-Line Application Processing (OLAP, or analytic ) queries are, typically: Few, but complex queries --- may run for hours. Queries do not depend on having an absolutely up-to-date database. 5

OLAP Examples 1. Amazon analyzes purchases by its customers to come up with an individual screen with products of likely interest to the customer. 2. Analysts at Wal-Mart look for items with increasing sales in some region. Use empty trucks to move merchandise between stores. 6

Common Architecture Databases at store branches handle OLTP. Local store databases copied to a central warehouse overnight. Analysts use the data warehouse for OLAP. 7

Star Schemas A star schema is a common organization for data at a warehouse. It consists of: 1. Fact table : a very large accumulation of facts such as sales. Often insert-only. 2. Dimension tables : smaller, generally static information about the entities involved in the facts. e.g., information about products. 8

Example: Star Schema Suppose we want to record in a warehouse information about every beer sale: the bar, the brand of beer, the drinker who bought the beer, the day, the time, and the price charged. The fact table is a relation: Sales(bar, beer, drinker, day, time, price) 9

Example -- Continued The dimension tables include information about the bar, beer, and drinker dimensions : Bars(bar, addr, license) Beers(beer, manf) Drinkers(drinker, addr, phone)... 10

Visualization Star Schema Dimension Table (Bars) Dimension Table (Drinkers) Dimension Attrs. Dependent Attrs. Fact Table - Sales Dimension Table (Beers) Dimension Table (etc.) 11

Dimensions and Dependent Attributes Two classes of fact-table attributes: 1. Dimension attributes : the foreign key of a dimension table. 2. Dependent attributes : a value determined by the dimension attributes of the tuple. 12

Example: Dependent Attribute price is the dependent attribute of our example Sales relation. It is determined by the combination of dimension attributes: bar, beer, drinker, and the time (combination of day and time-of-day attributes). 13

Approaches to Building Warehouses 1. ROLAP = relational OLAP : Tune a relational DBMS to support star schemas. 2. MOLAP = multidimensional OLAP : Use a specialized DBMS with a model such as the data cube. 14

MOLAP and Data Cubes Keys of dimension tables are the dimensions of a hypercube. Example: for the Sales data, the four dimensions are bar, beer, drinker, and time. Dependent attributes (e.g., price) appear at the points of the cube. 15

Visualization -- Data Cubes beer price bar drinker 16

Marginals The data cube also includes aggregation (typically SUM) along the margins of the cube. The marginals include aggregations over one dimension, two dimensions, 17

Visualization --- Data Cube w/aggregation beer price bar drinker 18

Example: Marginals Our 4-dimensional Sales cube includes the sum of price over each bar, each beer, each drinker, and each time unit (perhaps days). It would also have the sum of price over all bar-beer pairs, all bardrinker-day triples, 19

Structure of the Cube Think of each dimension as having an additional value *. A point with one or more * s in its coordinates aggregates over the dimensions with the * s. Example: Sales( Joe s Bar, Bud, *, *) holds the sum, over all drinkers and all time, of the Bud consumed at Joe s. 20

Drill-Down Drill-down = de-aggregate = break an aggregate into its constituents. Example: having determined that Joe s Bar sells very few Anheuser- Busch beers, break down his sales by particular A.-B. beer. 21

Roll-Up Roll-up = aggregate along one or more dimensions. Example: given a table of how much Bud each drinker consumes at each bar, roll it up into a table giving total amount of Bud consumed by each drinker (at all bars). 22

Example: Roll Up and Drill Down $ of Anheuser-Busch by drinker/bar Joe s Bar Nut- House Blue Chalk Jim Bob Mary 45 33 30 50 36 42 38 31 40 Roll up by Bar $ of A-B / drinker Jim 133 Bob 100 Mary 112 $ of A-B Beers / drinker Jim Bob Drill down by Beer Mary Bud 40 29 40 M lob 45 31 37 Bud Light 48 40 35 23

Data Mining Data mining is a popular term for queries that summarize big data sets in useful ways. Examples: 1. Clustering all Web pages by topic. 2. Finding characteristics of fraudulent credit-card use. 24

Market-Basket Data An important form of mining from relational data involves market baskets = sets of items that are purchased together as a customer leaves a store. Summary of basket data is frequent itemsets = sets of items that often appear together in baskets. 25

Example: Market Baskets If people often buy hamburger and ketchup together, the store can: 1. Put hamburger and ketchup near each other and put potato chips between. 2. Run a sale on hamburger and raise the price of ketchup. 26

Finding Frequent Pairs The simplest case is when we only want to find frequent pairs of items. Assume data is in a relation Baskets(basket, item). The support threshold s is the minimum number of baskets in which a pair appears. 27

Frequent Pairs in SQL SELECT b1.item, b2.item FROM Baskets b1, Baskets b2 WHERE b1.basket = b2.basket AND b1.item < b2.item GROUP BY b1.item, b2.item HAVING COUNT(*) >= s; Throw away pairs of items that do not appear at least s times. Look for two Basket tuples with the same basket and different items. First item must precede second, so we don t count the same pair twice. Create a group for each pair of items that appears in at least one basket. 28

A-Priori Trick (1) Straightforward implementation involves a join of a huge Baskets relation with itself. The a-priori algorithm speeds the query by recognizing that a pair of items {i, j } cannot have support s unless both {i } and {j } do. 29

A-Priori Trick (2) Use a materialized view to hold only information about frequent items. INSERT INTO Baskets1(basket, item) SELECT * FROM Baskets WHERE item IN ( SELECT item FROM Baskets GROUP BY item HAVING COUNT(*) >= s ); Items that appear in at least s baskets. 30

A-Priori Algorithm 1. Materialize the view Baskets1. 2. Run the obvious query, but on Baskets1 instead of Baskets. Computing Baskets1 is cheap, since it doesn t involve a join. Baskets1 probably has many fewer tuples than Baskets. Running time shrinks with the square of the number of tuples involved in the join. 31

Example: A-Priori Suppose: 1. A supermarket sells 10,000 items. 2. The average basket has 10 items. 3. The support threshold is 1% of the baskets. At most 1/10 of the items can be frequent. Factor 4 speedup. 32

Course Plug Big Data Analytics Winter 2018 33

Actions Read chapters: Information Integration (21.1, 21.2) and Data Mining (22.1, 22.2). 34